Data Info¶

########################################

The ExtraSensory Dataset

Primary data files (features and labels)

########################################

The ExtraSensory Dataset was collected by Yonatan Vaizman and Katherine Ellis, with the supervision of Gert Lanckriet. Department of Electrical and Computer Engineering, University of California, San Diego.

The dataset is publicly available. Any usage of the dataset for publications requires citing the official paper that introduced the dataset: Vaizman, Y., Ellis, K., and Lanckriet, G. "Recognizing Detailed Human Context In-the-Wild from Smartphones and Smartwatches". IEEE Pervasive Computing, vol. 16, no. 4, October-December 2017, pp. 62-74. doi:10.1109/MPRV.2017.3971131 (In the website, we refer to this original paper as Vaizman2017a)

########################################

Content of the primary data files: There are 60 'csv.gz' files, one for each participant (user, subject) in the data collection. Each of these files has filename with the form: [UUID].features_labels.csv.gz where each user has a unique (randomly generated) universal user identification (UUID) number. Each file is a textual CSV file, compressed using the gzip format.

Within every user's CSV file:¶

  • The first row specifies the columns of the file.
  • Every other row refers to an example from the user. The examples are sorted according to the primary key - the timestamp.
  • The columns:

-- First column is 'timestamp'. This is represented as standard number of seconds since the epoch.

-- Second, come columns for the extracted features. Unavailable features are represented with 'nan'. The name of each feature contains reference to the sensor it was extracted from, in the form [sensor_name]:[feature_name]. The current version contains features from the following sensors, with sensor names: --- raw_acc: Accelerometer from the phone. The 'raw' version of acceleration (as opposed to the decomposed versions of gravity and user-acceleration).
--- proc_gyro: Gyroscope from the phone. Processed version of gyroscope measurements (the OS calculates a version that removes drift). --- raw_magnet: Magnetometer from the phone. Raw version (as opposed to bias-fixed version that the OS also provides). --- watch_acceleration: Accelerometer from the watch. --- watch_heading: Heading from the compass on the watch. --- location: Location services. These features were extracted offline for every example from the sequence of latitude-longitude-altitude updates from the example's minute. These features regard only to relative-location (not absolute location in the world) - meaning, they describe variability of movement within the minute. --- location_quick_features: Location services. These features were calculated on the phone when data was collected. These are available even in cases that the other location features are not because the user wanted to conceal their absolute location coordinates. These quick features are very simple heuristics that approximate the more thoughtful offline features. --- audio_naive: Microphone. These naive features are simply averages and standard deviations of the 13 MFCCs from the ~20sec recording window of every example. --- discrete: Phone-state. These are binary indicators for the state of the phone. Notice that time_of_day features are also considered phone-state features (also have prefix 'discrete:'), but their columns appear not right after the other 'discrete' columns. --- lf_measurements: Various sensors that were recorded in low-frequency (meaning, once per example).

-- Third, come columns for the ground truth labels. The values are either 1 (label is relevant for the example), 0 (label is not relevant for the example), or 'nan' (label is considered 'missing' for this example). Originally, users could only report 'positive' labels (in the original ExtraSensory paper, Vaizman2017a, we assumed that when a label was not reported it is a 'negative' example). This cleaned version of the labels has the notion of 'missing labels'; Details about how we inferred missing label information is provided in the second paper, Vaizman2017b (see http://extrasensory.ucsd.edu for updated references). The names of the labels have prefix 'label:'. After the prefix: If the label name is all capitalized, it is an original label from the mobile app's interface and the values were taken from what the user originally reported. If the label name begins with 'FIX_', this is a fixes/cleaned version of a corresponding label, meaning that the researchers fixed some of the values that were reported by users because of inconsistencies. If the label name begins with 'OR_', this is a synthesized label, meaning it did not appear in the app's label menu, but rather the researchers created it as combination (using logical or) of other related labels. If the label name begins with 'LOC_', this is a fixed/cleaned version of a corresponding label that was fixed by researchers based on absolute location. LOC_beach was based on original label 'AT_THE_BEACH'. LOC_home was based on original label 'AT_HOME'. LOC_main_workplace was based on original label 'AT_WORK'.

-- Fourth, the last column is label_source, describing where the original labeling came from in the mobile app's interface. It has 8 possible values: -1: The user did not report any labels for this example (notice, however, that this example may still have labeling for the 'LOC_' labels). 0 : The user used the 'active feedback' interface (reporting immediate future). This example is the first in relevant minute sequence. 1 : The user used the 'active feedback' interface. This example is a continuation of a sequence of minutes since the user started the reported context. 2 : The user used the history interface to label an example from the past. 3 : The user replied to a notification that simply asked to provide any labels. 4 : The user replied to a notification that asked 'In the past [minutes] minutes were you still [recent context]?'. The user replied 'correct' on the phone. 5 : The user replied to a notification that asked 'In the past [minutes] minutes were you still [recent context]?'. The user replied 'not exactly' and then corrected the context labels. 6 : The user replied to a notification that asked 'In the past [minutes] minutes were you still [recent context]?'. The user replied 'correct' on the watch interface.

########################################

Data Consolidation¶

Getting the data from all the different files into one csv.

Game plan from here: new csv file for generated labels and real labels model that I am working on (using lstm model to predict next USER DATA & next activity prediction and create a row for it) concat into singular CSV (don't have to rename files beforehand)

Two csv files ?

Possibly use labels to predict next user data

In [ ]:
# Standard library imports
import gzip
import os
import shutil
import zipfile
import pandas as pd
import numpy as np
import pickle
import seaborn as sns
import tensorflow as tf
import matplotlib.pyplot as plt

# Related third-party imports
from IPython.display import Markdown, display
from keras.layers import Dense, Dropout, LSTM
from keras.models import Sequential
from keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from skmultilearn.problem_transform import ClassifierChain
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.preprocessing import StandardScaler
In [ ]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
In [ ]:
# Making sure ExtraSensory.per_uuid_features_labels.zip exists and is unziped

def unzip(zip_file):
    # Extract to the directory obtained from the zip file name
    zip_extract_to = zip_file.replace('.zip', '')

    # Unzipping
    if os.path.exists(zip_file):
        if not os.path.exists(zip_extract_to):
            os.makedirs(zip_extract_to)
            with zipfile.ZipFile(zip_file, 'r') as zip_ref:
                zip_ref.extractall(zip_extract_to)
            message = "Unzipped successfully."
        else:
            message = "Directory already exists. File might be unzipped."
    else:
        message = "Zip file not found."

    print(message)
    return zip_extract_to


def csv_extract(zip_extract_to):
    # Improved variable name for the directory where the extracted files will be saved
    unzipped_data_dir = f"{zip_extract_to}-Unzipped"

    # Create the unzipped data directory if it does not exist
    if not os.path.exists(unzipped_data_dir):
        os.makedirs(unzipped_data_dir)

    # Extracting .csv.gz files
    extraction_message = ""
    if os.path.exists(zip_extract_to):
        for file in os.listdir(zip_extract_to):
            if file.endswith('.gz'):
                gz_file_path = os.path.join(zip_extract_to, file)
                csv_file_path = os.path.join(unzipped_data_dir, file[:-3])  # Removing '.gz' from filename

                try:
                    with gzip.open(gz_file_path, 'rb') as f_in:
                        with open(csv_file_path, 'wb') as f_out:
                            shutil.copyfileobj(f_in, f_out)
                    extraction_message += f"Extracted {file}\n"
                except Exception as e:
                    extraction_message += f"Error extracting {file}: {e}\n"
    else:
        extraction_message = "Directory with .gz files not found."

    print(extraction_message.strip())

    return unzipped_data_dir


# Function to extract user_id from filename
def extract_user_id(filename):
    return filename.split('.')[0]

def make_one_csv(unzipped_data_dir, COMBINED_FILE):
    # Combining all CSVs into one dataframe
    combined_csv_data = pd.DataFrame()

    if os.path.exists(unzipped_data_dir):
        for file in os.listdir(unzipped_data_dir):
            if file.endswith('.csv'):
                file_path = os.path.join(unzipped_data_dir, file)
                user_id = extract_user_id(file)

                # Read the CSV file and add the user_id column
                csv_data = pd.read_csv(file_path)
                csv_data['user_id'] = user_id

                # Append to the combined dataframe
                combined_csv_data = pd.concat([combined_csv_data, csv_data], ignore_index=True)

                
                #print(f"Processed file: {file} \nCurrent size of combined data: {combined_csv_data.shape}")


        # Check if any data has been combined
        if not combined_csv_data.empty:
            # Save the combined CSV data to a file
            combined_csv_data.to_csv(COMBINED_FILE, index=False)
            print(f"Combined CSV file created at {COMBINED_FILE}.")
        else:
            print("No CSV files found to combine or combined data is empty.")
    else:
        print("Directory with unzipped CSV files not found.")
    return COMBINED_FILE
In [ ]:
COMBINED_FILE = 'ExtraSensory_Combined_User_Data.csv'
if not os.path.exists(COMBINED_FILE):
    # Path of the zip file
    zip_file = 'ExtraSensory.per_uuid_features_labels.zip'
    zip_extract_to = unzip(zip_file)
    unzipped_data_dir = csv_extract(zip_extract_to)
    make_one_csv(unzipped_data_dir, COMBINED_FILE)
else:
    print('Combined file already exists.')
Combined file already exists.

Data Exploration¶

In [ ]:
combined_csv_data = pd.read_csv(COMBINED_FILE)
combined_csv_data['timestamp'] = pd.to_datetime(combined_csv_data['timestamp'], unit='s')
In [ ]:
print(combined_csv_data.columns)

# user_id is for us to make sure we have record on source of the data. 
Index(['timestamp', 'raw_acc:magnitude_stats:mean',
       'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3',
       'raw_acc:magnitude_stats:moment4',
       'raw_acc:magnitude_stats:percentile25',
       'raw_acc:magnitude_stats:percentile50',
       'raw_acc:magnitude_stats:percentile75',
       'raw_acc:magnitude_stats:value_entropy',
       'raw_acc:magnitude_stats:time_entropy',
       ...
       'label:ELEVATOR', 'label:OR_standing', 'label:AT_SCHOOL',
       'label:PHONE_IN_HAND', 'label:PHONE_IN_BAG', 'label:PHONE_ON_TABLE',
       'label:WITH_CO-WORKERS', 'label:WITH_FRIENDS', 'label_source',
       'user_id'],
      dtype='object', length=279)
In [ ]:
# Trying to understand columns
def build_hierarchy(columns):
    # Build a nested dictionary representing the hierarchy of columns.
    hierarchy = {}
    for col in columns:
        parts = col.split(':')
        current_level = hierarchy

        for part in parts[:-1]:
            current_level = current_level.setdefault(part, {})
        
        current_level[parts[-1]] = col

    return hierarchy

def format_hierarchy(hierarchy, indent=0):
    # Format the hierarchy into a readable string with indentation.
    result = ""
    for key, value in hierarchy.items():
        prefix = "  " * indent + "- "
        if isinstance(value, dict):
            result += f"{prefix}{key}:\n{format_hierarchy(value, indent + 1)}"
        else:
            result += f"{prefix} {key}\n"
    return result

# Building and formatting the hierarchy
hierarchy = build_hierarchy(combined_csv_data.columns)
formatted_hierarchy = format_hierarchy(hierarchy)
print(formatted_hierarchy)
-  timestamp
- raw_acc:
  - magnitude_stats:
    -  mean
    -  std
    -  moment3
    -  moment4
    -  percentile25
    -  percentile50
    -  percentile75
    -  value_entropy
    -  time_entropy
  - magnitude_spectrum:
    -  log_energy_band0
    -  log_energy_band1
    -  log_energy_band2
    -  log_energy_band3
    -  log_energy_band4
    -  spectral_entropy
  - magnitude_autocorrelation:
    -  period
    -  normalized_ac
  - 3d:
    -  mean_x
    -  mean_y
    -  mean_z
    -  std_x
    -  std_y
    -  std_z
    -  ro_xy
    -  ro_xz
    -  ro_yz
- proc_gyro:
  - magnitude_stats:
    -  mean
    -  std
    -  moment3
    -  moment4
    -  percentile25
    -  percentile50
    -  percentile75
    -  value_entropy
    -  time_entropy
  - magnitude_spectrum:
    -  log_energy_band0
    -  log_energy_band1
    -  log_energy_band2
    -  log_energy_band3
    -  log_energy_band4
    -  spectral_entropy
  - magnitude_autocorrelation:
    -  period
    -  normalized_ac
  - 3d:
    -  mean_x
    -  mean_y
    -  mean_z
    -  std_x
    -  std_y
    -  std_z
    -  ro_xy
    -  ro_xz
    -  ro_yz
- raw_magnet:
  - magnitude_stats:
    -  mean
    -  std
    -  moment3
    -  moment4
    -  percentile25
    -  percentile50
    -  percentile75
    -  value_entropy
    -  time_entropy
  - magnitude_spectrum:
    -  log_energy_band0
    -  log_energy_band1
    -  log_energy_band2
    -  log_energy_band3
    -  log_energy_band4
    -  spectral_entropy
  - magnitude_autocorrelation:
    -  period
    -  normalized_ac
  - 3d:
    -  mean_x
    -  mean_y
    -  mean_z
    -  std_x
    -  std_y
    -  std_z
    -  ro_xy
    -  ro_xz
    -  ro_yz
  -  avr_cosine_similarity_lag_range0
  -  avr_cosine_similarity_lag_range1
  -  avr_cosine_similarity_lag_range2
  -  avr_cosine_similarity_lag_range3
  -  avr_cosine_similarity_lag_range4
- watch_acceleration:
  - magnitude_stats:
    -  mean
    -  std
    -  moment3
    -  moment4
    -  percentile25
    -  percentile50
    -  percentile75
    -  value_entropy
    -  time_entropy
  - magnitude_spectrum:
    -  log_energy_band0
    -  log_energy_band1
    -  log_energy_band2
    -  log_energy_band3
    -  log_energy_band4
    -  spectral_entropy
  - magnitude_autocorrelation:
    -  period
    -  normalized_ac
  - 3d:
    -  mean_x
    -  mean_y
    -  mean_z
    -  std_x
    -  std_y
    -  std_z
    -  ro_xy
    -  ro_xz
    -  ro_yz
  - spectrum:
    -  x_log_energy_band0
    -  x_log_energy_band1
    -  x_log_energy_band2
    -  x_log_energy_band3
    -  x_log_energy_band4
    -  y_log_energy_band0
    -  y_log_energy_band1
    -  y_log_energy_band2
    -  y_log_energy_band3
    -  y_log_energy_band4
    -  z_log_energy_band0
    -  z_log_energy_band1
    -  z_log_energy_band2
    -  z_log_energy_band3
    -  z_log_energy_band4
  - relative_directions:
    -  avr_cosine_similarity_lag_range0
    -  avr_cosine_similarity_lag_range1
    -  avr_cosine_similarity_lag_range2
    -  avr_cosine_similarity_lag_range3
    -  avr_cosine_similarity_lag_range4
- watch_heading:
  -  mean_cos
  -  std_cos
  -  mom3_cos
  -  mom4_cos
  -  mean_sin
  -  std_sin
  -  mom3_sin
  -  mom4_sin
  -  entropy_8bins
- location:
  -  num_valid_updates
  -  log_latitude_range
  -  log_longitude_range
  -  min_altitude
  -  max_altitude
  -  min_speed
  -  max_speed
  -  best_horizontal_accuracy
  -  best_vertical_accuracy
  -  diameter
  -  log_diameter
- location_quick_features:
  -  std_lat
  -  std_long
  -  lat_change
  -  long_change
  -  mean_abs_lat_deriv
  -  mean_abs_long_deriv
- audio_naive:
  - mfcc0:
    -  mean
    -  std
  - mfcc1:
    -  mean
    -  std
  - mfcc2:
    -  mean
    -  std
  - mfcc3:
    -  mean
    -  std
  - mfcc4:
    -  mean
    -  std
  - mfcc5:
    -  mean
    -  std
  - mfcc6:
    -  mean
    -  std
  - mfcc7:
    -  mean
    -  std
  - mfcc8:
    -  mean
    -  std
  - mfcc9:
    -  mean
    -  std
  - mfcc10:
    -  mean
    -  std
  - mfcc11:
    -  mean
    -  std
  - mfcc12:
    -  mean
    -  std
- audio_properties:
  -  max_abs_value
  -  normalization_multiplier
- discrete:
  - app_state:
    -  is_active
    -  is_inactive
    -  is_background
    -  missing
  - battery_plugged:
    -  is_ac
    -  is_usb
    -  is_wireless
    -  missing
  - battery_state:
    -  is_unknown
    -  is_unplugged
    -  is_not_charging
    -  is_discharging
    -  is_charging
    -  is_full
    -  missing
  - on_the_phone:
    -  is_False
    -  is_True
    -  missing
  - ringer_mode:
    -  is_normal
    -  is_silent_no_vibrate
    -  is_silent_with_vibrate
    -  missing
  - wifi_status:
    -  is_not_reachable
    -  is_reachable_via_wifi
    -  is_reachable_via_wwan
    -  missing
  - time_of_day:
    -  between0and6
    -  between3and9
    -  between6and12
    -  between9and15
    -  between12and18
    -  between15and21
    -  between18and24
    -  between21and3
- lf_measurements:
  -  light
  -  pressure
  -  proximity_cm
  -  proximity
  -  relative_humidity
  -  battery_level
  -  screen_brightness
  -  temperature_ambient
- label:
  -  LYING_DOWN
  -  SITTING
  -  FIX_walking
  -  FIX_running
  -  BICYCLING
  -  SLEEPING
  -  LAB_WORK
  -  IN_CLASS
  -  IN_A_MEETING
  -  LOC_main_workplace
  -  OR_indoors
  -  OR_outside
  -  IN_A_CAR
  -  ON_A_BUS
  -  DRIVE_-_I_M_THE_DRIVER
  -  DRIVE_-_I_M_A_PASSENGER
  -  LOC_home
  -  FIX_restaurant
  -  PHONE_IN_POCKET
  -  OR_exercise
  -  COOKING
  -  SHOPPING
  -  STROLLING
  -  DRINKING__ALCOHOL_
  -  BATHING_-_SHOWER
  -  CLEANING
  -  DOING_LAUNDRY
  -  WASHING_DISHES
  -  WATCHING_TV
  -  SURFING_THE_INTERNET
  -  AT_A_PARTY
  -  AT_A_BAR
  -  LOC_beach
  -  SINGING
  -  TALKING
  -  COMPUTER_WORK
  -  EATING
  -  TOILET
  -  GROOMING
  -  DRESSING
  -  AT_THE_GYM
  -  STAIRS_-_GOING_UP
  -  STAIRS_-_GOING_DOWN
  -  ELEVATOR
  -  OR_standing
  -  AT_SCHOOL
  -  PHONE_IN_HAND
  -  PHONE_IN_BAG
  -  PHONE_ON_TABLE
  -  WITH_CO-WORKERS
  -  WITH_FRIENDS
-  label_source
-  user_id

In [ ]:
# List of label columns to check
label_columns = [col for col in combined_csv_data.columns if col.startswith("label:")]

# Assumption of negatives for ground truths
combined_csv_data[label_columns] = combined_csv_data[label_columns].fillna(0)

combined_csv_data['label_sum_inital'] = combined_csv_data[label_columns].sum(axis=1)
combined_csv_data['label:UNKNOWN'] = (combined_csv_data['label_sum_inital'] == 0).astype(float)
label_columns.append('label:UNKNOWN')
combined_csv_data = combined_csv_data.drop('label_sum_inital', axis=1)
df = combined_csv_data.copy()


# Function to find the label name with value 1
def find_label_name(row):
    for col in label_columns:
        if row[col] == 1:
            return col.split("label:")[1]
    return None
In [ ]:
# Checking ground truth labels for value counts
column_sums = df[label_columns].sum()
column_sums_sorted = column_sums.sort_values(ascending=True)

# Plot setup
plt.figure(figsize=(10, 14)) 
column_sums_sorted.plot(kind='barh')
plt.title('Sum of Each Column in DataFrame')
plt.xlabel('Count')
plt.ylabel('Most Done Activities')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [ ]:
# Deleting the other columns
unneeded_columns = ['user_id', 'label_source']
output_columns = [col for col in combined_csv_data.columns if col.startswith('label:')]
input_columns = [col for col in combined_csv_data.columns if col not in output_columns and col not in unneeded_columns]
X_main = df.copy()[input_columns]
In [ ]:
# Checking input variables for missing values
def nan_percentage(df):
    nan_percentage = (df.isna().mean() * 100).round(2)
    nan_percentage_df = pd.DataFrame({'Variable': nan_percentage.index, 'NaN Percentage': nan_percentage.values})
    nan_percentage_df = nan_percentage_df.sort_values(by='NaN Percentage', ascending=True)
    
    # Plot setup
    plt.figure(figsize=(10, 35)) 
    plt.barh(nan_percentage_df['Variable'], nan_percentage_df['NaN Percentage'], color='skyblue')
    plt.xlabel('Percentage of Missing Values')
    plt.ylabel('Variables')
    plt.title('Missing Value Percentage by Variable')
    plt.grid(axis='x')
    plt.tight_layout() 
    plt.show()
    return nan_percentage_df

# Checking overall missing percentage
nan_percentage_df = nan_percentage(X_main)
No description has been provided for this image
In [ ]:
X_with_users = df.drop(columns=['label_source'])
In [ ]:
users = X_with_users['user_id'].unique()
len(users)
Out[ ]:
60
In [ ]:
features = input_columns
In [ ]:
# Initialize a list to store the counts for each user
nan_counts_list = []

# Loop through each user
for user in users:
    # Filter the DataFrame for the current user
    df_user = X_with_users[X_with_users['user_id'] == user]
    
    # Count the NaN values for each feature for the current user and add user_id to the series
    nan_count = df_user[features].isna().sum()
    nan_count['user_id'] = user  # Add user_id to the count
    
    # Append the count series to the list
    nan_counts_list.append(nan_count)

# Convert the list of Series to a DataFrame
nan_counts_per_user = pd.DataFrame(nan_counts_list)

# If needed, set the user_id as the index
nan_counts_per_user.set_index('user_id', inplace=True)


# Plot the heat map
plt.figure(figsize=(30, 30))
sns.heatmap(nan_counts_per_user, annot=False, cmap='Reds')
plt.title('Heatmap of Missing Values per User')
plt.xlabel('Features')
plt.ylabel('Users')
plt.show()
No description has been provided for this image
In [ ]:
# Calculate the total number of rows for each user in the original DataFrame
user_total_length = X_with_users.groupby('user_id').size()

# Convert this to a DataFrame or a Series that can be added to nan_counts_per_user
user_total_length_df = user_total_length.to_frame(name='total_length')

# Merge this information with nan_counts_per_user
# Since nan_counts_per_user already has user_id as its index, we can directly add the new column
nan_counts_per_user['total_length'] = user_total_length_df['total_length']

# Now, nan_counts_per_user includes the total_length column
In [ ]:
# Calculate the total NaN count for each feature across all users
total_nan_counts = nan_counts_per_user.sum()

# Assuming `X_with_users` is your original DataFrame and has the same number of entries for each user,
# Calculate the total number of entries for a single feature across all users
total_entries_per_feature = len(X_with_users)  # Or, more specifically, len(users) * average_entries_per_user if varies

# Calculate the percentage of missing data for each feature
percentage_missing = (total_nan_counts / total_entries_per_feature) * 100


# Decide on a threshold for removing columns, e.g., 1%
threshold = 0

# Identify columns that exceed this threshold
columns_to_remove = percentage_missing[percentage_missing > threshold].index.tolist()

print("Removing ", len(columns_to_remove),"columns out of ", len(X_with_users.columns))
# Print out the columns to remove
# print("Columns to remove due to excessive missing data:", columns_to_remove)

features_to_include = [feature for feature in features if feature not in columns_to_remove]

if 'timestamp' in X_with_users.columns:
    X_with_users['timestamp_numeric'] = X_with_users['timestamp'].astype(np.int64) // 10**9
    # Ensure 'timestamp_numeric' is included and 'timestamp' is excluded from features_to_include
    features_to_include = [f for f in features_to_include if f != 'timestamp'] + ['timestamp_numeric']

# Continue with your existing preprocessing...
user_df = X_with_users[X_with_users['user_id'] == users[-1]]
median_values = user_df[features_to_include].median()
user_df = user_df[features_to_include].fillna(median_values)
Removing  192 columns out of  279

Testing to see if we need to do Binary Relevance, or Classifier chains. We can also check Label Powerset, but the issue is combinations can become very large since we have 52 labels.¶

In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np


combined_csv_data_4_model = combined_csv_data.copy()
combined_csv_data_4_model['timestamp_numeric'] = combined_csv_data_4_model['timestamp'].astype(np.int64) // 10**9
combined_csv_data_4_model = combined_csv_data_4_model.drop(columns=['timestamp'])
X = combined_csv_data_4_model[features_to_include]
y = combined_csv_data_4_model[output_columns]

# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
In [ ]:
plt.figure(figsize = (20,20))
corr_matrix = y.corr(method = 'pearson')  # Compute the correlation matrix

# Flatten the matrix, sort by absolute value while preserving names
corr_flat = corr_matrix.unstack()
corr_flat_sorted = corr_flat.abs().sort_values(ascending=False)

# Remove self-correlations
corr_flat_sorted = corr_flat_sorted[corr_flat_sorted < 1]

# Take the top N correlations for plotting (for simplicity, let's plot all unique pairs)
unique_pairs = corr_flat_sorted.drop_duplicates().head(10)

# Plotting
plt.figure(figsize=(10, 6))
unique_pairs.plot(kind='bar')
plt.title('Top Correlations Between Labels')
plt.xlabel('Label Pairs')
plt.ylabel('Correlation')
plt.xticks(rotation=45, ha='right')
plt.show()
<Figure size 2000x2000 with 0 Axes>
No description has been provided for this image

Seems like there is a corrolation, lets do Classifier chains.

In [ ]:
plt.figure(figsize = (20,20))
corr = y.corr(method = 'pearson')
corr_flat = corr.unstack().sort_values(ascending =False)

sns.heatmap(corr, annot = False, cmap = 'coolwarm')
plt.show()
No description has been provided for this image
In [ ]:
combined_csv_data.shape
Out[ ]:
(377346, 280)
In [ ]:
combined_csv_data_4_model = combined_csv_data.iloc[:37734,:].copy()
combined_csv_data_4_model['timestamp_numeric'] = combined_csv_data_4_model['timestamp'].astype(np.int64) // 10**9
combined_csv_data_4_model = combined_csv_data_4_model.drop(columns=['timestamp'])
X = combined_csv_data_4_model[features_to_include]
y = combined_csv_data_4_model[output_columns]

# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
In [ ]:
# Initialize Classifier Chain with a RandomForest base classifier
classifier = ClassifierChain(RandomForestClassifier())

# Train the Classifier Chain model
classifier.fit(X_train, y_train)

# Make predictions
predictions = classifier.predict(X_test)

# Note: accuracy_score expects single-label predictions,
# so for multi-label you might use another metric like hamming loss or a subset accuracy function
# Here's an example with a custom subset accuracy for multi-label
def subset_accuracy(y_true, y_pred):
    return (y_true == y_pred).all(axis=1).mean()

print("Subset Accuracy: ", subset_accuracy(y_test, predictions.toarray()))
Subset Accuracy:  0.956804028090632
In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
import pickle

user = users[0]

models_data = {
    'models': {},
    'accuracies': {}
}
In [ ]:
combined_csv_data_4_model = combined_csv_data.copy()
combined_csv_data_4_model['timestamp_numeric'] = combined_csv_data_4_model['timestamp'].astype(np.int64) // 10**9
combined_csv_data_4_model = combined_csv_data_4_model.drop(columns=['timestamp'])


import warnings
warnings.simplefilter(action='ignore', category=RuntimeWarning)

counter = 1

for user in users:
    
    user_df = combined_csv_data_4_model[combined_csv_data['user_id'] == user]
    
    print(f'Shape of df of user no. {counter} of id {user} is: {user_df.shape}')
    # Assuming 'combined_csv_data' is your DataFrame
    X = user_df[features_to_include]
    y = user_df[output_columns]
    # Normalize features
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    # Split the dataset into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

    # Initialize Classifier Chain with a RandomForest base classifier
    classifier = ClassifierChain(RandomForestClassifier())
    # Train the Classifier Chain model
    classifier.fit(X_train, y_train)
    models_data['models'][user] = classifier
     # Make predictions
    predictions = classifier.predict(X_test)
    # Evaluate your model
    # Example: Using accuracy score, you can choose other metrics as appropriate
    from sklearn.metrics import accuracy_score

    # Note: accuracy_score expects single-label predictions,
    # so for multi-label you might use another metric like hamming loss or a subset accuracy function
    # Here's an example with a custom subset accuracy for multi-label
    def subset_accuracy(y_true, y_pred):
        return (y_true == y_pred).all(axis=1).mean()
    accuracy = subset_accuracy(y_test, predictions.toarray())
    models_data['accuracies'][user] = accuracy
    print(f"Subset Accuracy for {user}: ", accuracy)
    with open('clfs_2.pkl', 'wb') as file:
        pickle.dump(models_data, file)
    print('File Updated')
    counter = counter +1
Shape of df of user no. 1 of id 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0 is: (6407, 280)
Subset Accuracy for 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0:  0.9446177847113885
File Updated
Shape of df of user no. 2 of id 61359772-D8D8-480D-B623-7C636EAD0C81 is: (6079, 280)
Subset Accuracy for 61359772-D8D8-480D-B623-7C636EAD0C81:  0.9769736842105263
File Updated
Shape of df of user no. 3 of id 40E170A7-607B-4578-AF04-F021C3B0384A is: (7649, 280)
Subset Accuracy for 40E170A7-607B-4578-AF04-F021C3B0384A:  0.949673202614379
File Updated
Shape of df of user no. 4 of id 806289BC-AD52-4CC1-806C-0CDB14D65EB6 is: (9242, 280)
Subset Accuracy for 806289BC-AD52-4CC1-806C-0CDB14D65EB6:  0.9502433747971877
File Updated
Shape of df of user no. 5 of id 61976C24-1C50-4355-9C49-AAE44A7D09F6 is: (8730, 280)
Subset Accuracy for 61976C24-1C50-4355-9C49-AAE44A7D09F6:  0.9616265750286369
File Updated
Shape of df of user no. 6 of id D7D20E2E-FC78-405D-B346-DBD3FD8FC92B is: (6210, 280)
Subset Accuracy for D7D20E2E-FC78-405D-B346-DBD3FD8FC92B:  0.9082125603864735
File Updated
Shape of df of user no. 7 of id 7D9BB102-A612-4E2A-8E22-3159752F55D8 is: (1600, 280)
Subset Accuracy for 7D9BB102-A612-4E2A-8E22-3159752F55D8:  0.934375
File Updated
Shape of df of user no. 8 of id 5119D0F8-FCA8-4184-A4EB-19421A40DE0D is: (6617, 280)
Subset Accuracy for 5119D0F8-FCA8-4184-A4EB-19421A40DE0D:  0.9350453172205438
File Updated
Shape of df of user no. 9 of id 9DC38D04-E82E-4F29-AB52-B476535226F2 is: (9686, 280)
Subset Accuracy for 9DC38D04-E82E-4F29-AB52-B476535226F2:  0.8668730650154799
File Updated
Shape of df of user no. 10 of id A7599A50-24AE-46A6-8EA6-2576F1011D81 is: (3898, 280)
Subset Accuracy for A7599A50-24AE-46A6-8EA6-2576F1011D81:  0.9807692307692307
File Updated
Shape of df of user no. 11 of id 59EEFAE0-DEB0-4FFF-9250-54D2A03D0CF2 is: (7542, 280)
Subset Accuracy for 59EEFAE0-DEB0-4FFF-9250-54D2A03D0CF2:  0.9390324718356527
File Updated
Shape of df of user no. 12 of id 24E40C4C-A349-4F9F-93AB-01D00FB994AF is: (4771, 280)
Subset Accuracy for 24E40C4C-A349-4F9F-93AB-01D00FB994AF:  0.9099476439790576
File Updated
Shape of df of user no. 13 of id 9759096F-1119-4E19-A0AD-6F16989C7E1C is: (9959, 280)
Subset Accuracy for 9759096F-1119-4E19-A0AD-6F16989C7E1C:  0.9357429718875502
File Updated
Shape of df of user no. 14 of id 1155FF54-63D3-4AB2-9863-8385D0BD0A13 is: (2685, 280)
Subset Accuracy for 1155FF54-63D3-4AB2-9863-8385D0BD0A13:  0.8677839851024208
File Updated
Shape of df of user no. 15 of id 96A358A0-FFF2-4239-B93E-C7425B901B47 is: (5819, 280)
Subset Accuracy for 96A358A0-FFF2-4239-B93E-C7425B901B47:  0.9690721649484536
File Updated
Shape of df of user no. 16 of id 78A91A4E-4A51-4065-BDA7-94755F0BB3BB is: (11996, 280)
Subset Accuracy for 78A91A4E-4A51-4065-BDA7-94755F0BB3BB:  0.97
File Updated
Shape of df of user no. 17 of id F50235E0-DD67-4F2A-B00B-1F31ADA998B9 is: (2266, 280)
Subset Accuracy for F50235E0-DD67-4F2A-B00B-1F31ADA998B9:  0.8325991189427313
File Updated
Shape of df of user no. 18 of id 1538C99F-BA1E-4EFB-A949-6C7C47701B20 is: (6549, 280)
Subset Accuracy for 1538C99F-BA1E-4EFB-A949-6C7C47701B20:  0.9564885496183206
File Updated
Shape of df of user no. 19 of id 11B5EC4D-4133-4289-B475-4E737182A406 is: (8845, 280)
Subset Accuracy for 11B5EC4D-4133-4289-B475-4E737182A406:  0.9101187111362352
File Updated
Shape of df of user no. 20 of id 098A72A5-E3E5-4F54-A152-BBDA0DF7B694 is: (6813, 280)
Subset Accuracy for 098A72A5-E3E5-4F54-A152-BBDA0DF7B694:  0.9053558327219369
File Updated
Shape of df of user no. 21 of id 59818CD2-24D7-4D32-B133-24C2FE3801E5 is: (5947, 280)
Subset Accuracy for 59818CD2-24D7-4D32-B133-24C2FE3801E5:  0.9403361344537815
File Updated
Shape of df of user no. 22 of id 33A85C34-CFE4-4732-9E73-0A7AC861B27A is: (6172, 280)
Subset Accuracy for 33A85C34-CFE4-4732-9E73-0A7AC861B27A:  0.9465587044534413
File Updated
Shape of df of user no. 23 of id 00EABED2-271D-49D8-B599-1D4A09240601 is: (2287, 280)
Subset Accuracy for 00EABED2-271D-49D8-B599-1D4A09240601:  0.8013100436681223
File Updated
Shape of df of user no. 24 of id 136562B6-95B2-483D-88DC-065F28409FD2 is: (6218, 280)
Subset Accuracy for 136562B6-95B2-483D-88DC-065F28409FD2:  0.8593247588424437
File Updated
Shape of df of user no. 25 of id B9724848-C7E2-45F4-9B3F-A1F38D864495 is: (7626, 280)
Subset Accuracy for B9724848-C7E2-45F4-9B3F-A1F38D864495:  0.9462647444298821
File Updated
Shape of df of user no. 26 of id CF722AA9-2533-4E51-9FEB-9EAC84EE9AAC is: (3615, 280)
Subset Accuracy for CF722AA9-2533-4E51-9FEB-9EAC84EE9AAC:  0.8616874135546335
File Updated
Shape of df of user no. 27 of id FDAA70A1-42A3-4E3F-9AE3-3FDA412E03BF is: (4973, 280)
Subset Accuracy for FDAA70A1-42A3-4E3F-9AE3-3FDA412E03BF:  0.9587939698492463
File Updated
Shape of df of user no. 28 of id A5CDF89D-02A2-4EC1-89F8-F534FDABDD96 is: (6040, 280)
Subset Accuracy for A5CDF89D-02A2-4EC1-89F8-F534FDABDD96:  0.7971854304635762
File Updated
Shape of df of user no. 29 of id 0BFC35E2-4817-4865-BFA7-764742302A2D is: (3108, 280)
Subset Accuracy for 0BFC35E2-4817-4865-BFA7-764742302A2D:  0.905144694533762
File Updated
Shape of df of user no. 30 of id BEF6C611-50DA-4971-A040-87FB979F3FC1 is: (3451, 280)
Subset Accuracy for BEF6C611-50DA-4971-A040-87FB979F3FC1:  0.9507959479015919
File Updated
Shape of df of user no. 31 of id 4FC32141-E888-4BFF-8804-12559A491D8C is: (4979, 280)
Subset Accuracy for 4FC32141-E888-4BFF-8804-12559A491D8C:  0.9257028112449799
File Updated
Shape of df of user no. 32 of id A76A5AF5-5A93-4CF2-A16E-62353BB70E8A is: (7520, 280)
Subset Accuracy for A76A5AF5-5A93-4CF2-A16E-62353BB70E8A:  0.9394946808510638
File Updated
Shape of df of user no. 33 of id 3600D531-0C55-44A7-AE95-A7A38519464E is: (5203, 280)
Subset Accuracy for 3600D531-0C55-44A7-AE95-A7A38519464E:  0.9615754082612872
File Updated
Shape of df of user no. 34 of id 2C32C23E-E30C-498A-8DD2-0EFB9150A02E is: (8516, 280)
Subset Accuracy for 2C32C23E-E30C-498A-8DD2-0EFB9150A02E:  0.9401408450704225
File Updated
Shape of df of user no. 35 of id 86A4F379-B305-473D-9D83-FC7D800180EF is: (10738, 280)
Subset Accuracy for 86A4F379-B305-473D-9D83-FC7D800180EF:  0.9762569832402235
File Updated
Shape of df of user no. 36 of id 99B204C0-DD5C-4BB7-83E8-A37281B8D769 is: (6038, 280)
Subset Accuracy for 99B204C0-DD5C-4BB7-83E8-A37281B8D769:  0.9271523178807947
File Updated
Shape of df of user no. 37 of id 74B86067-5D4B-43CF-82CF-341B76BEA0F4 is: (7298, 280)
Subset Accuracy for 74B86067-5D4B-43CF-82CF-341B76BEA0F4:  0.9541095890410959
File Updated
Shape of df of user no. 38 of id 5EF64122-B513-46AE-BCF1-E62AAC285D2C is: (3911, 280)
Subset Accuracy for 5EF64122-B513-46AE-BCF1-E62AAC285D2C:  0.9106002554278416
File Updated
Shape of df of user no. 39 of id B7F9D634-263E-4A97-87F9-6FFB4DDCB36C is: (9383, 280)
Subset Accuracy for B7F9D634-263E-4A97-87F9-6FFB4DDCB36C:  0.9355354288758657
File Updated
Shape of df of user no. 40 of id A5A30F76-581E-4757-97A2-957553A2C6AA is: (1667, 280)
Subset Accuracy for A5A30F76-581E-4757-97A2-957553A2C6AA:  0.8922155688622755
File Updated
Shape of df of user no. 41 of id C48CE857-A0DD-4DDB-BEA5-3A25449B2153 is: (5092, 280)
Subset Accuracy for C48CE857-A0DD-4DDB-BEA5-3A25449B2153:  0.9558390578999019
File Updated
Shape of df of user no. 42 of id 83CF687B-7CEC-434B-9FE8-00C3D5799BE6 is: (9539, 280)
Subset Accuracy for 83CF687B-7CEC-434B-9FE8-00C3D5799BE6:  0.9475890985324947
File Updated
Shape of df of user no. 43 of id 0A986513-7828-4D53-AA1F-E02D6DF9561B is: (3960, 280)
Subset Accuracy for 0A986513-7828-4D53-AA1F-E02D6DF9561B:  0.9570707070707071
File Updated
Shape of df of user no. 44 of id 7CE37510-56D0-4120-A1CF-0E23351428D2 is: (9761, 280)
Subset Accuracy for 7CE37510-56D0-4120-A1CF-0E23351428D2:  0.9406041986687148
File Updated
Shape of df of user no. 45 of id E65577C1-8D5D-4F70-AF23-B3ADB9D3DBA3 is: (3441, 280)
Subset Accuracy for E65577C1-8D5D-4F70-AF23-B3ADB9D3DBA3:  0.8127721335268505
File Updated
Shape of df of user no. 46 of id CCAF77F0-FABB-4F2F-9E24-D56AD0C5A82F is: (8472, 280)
Subset Accuracy for CCAF77F0-FABB-4F2F-9E24-D56AD0C5A82F:  0.9669616519174041
File Updated
Shape of df of user no. 47 of id CA820D43-E5E2-42EF-9798-BE56F776370B is: (7865, 280)
Subset Accuracy for CA820D43-E5E2-42EF-9798-BE56F776370B:  0.8951048951048951
File Updated
Shape of df of user no. 48 of id 8023FE1A-D3B0-4E2C-A57A-9321B7FC755F is: (9189, 280)
Subset Accuracy for 8023FE1A-D3B0-4E2C-A57A-9321B7FC755F:  0.9515778019586507
File Updated
Shape of df of user no. 49 of id 481F4DD2-7689-43B9-A2AA-C8772227162B is: (6691, 280)
Subset Accuracy for 481F4DD2-7689-43B9-A2AA-C8772227162B:  0.9051530993278566
File Updated
Shape of df of user no. 50 of id CDA3BBF7-6631-45E8-85BA-EEB416B32A3C is: (2860, 280)
Subset Accuracy for CDA3BBF7-6631-45E8-85BA-EEB416B32A3C:  0.9912587412587412
File Updated
Shape of df of user no. 51 of id 4E98F91F-4654-42EF-B908-A3389443F2E7 is: (3250, 280)
Subset Accuracy for 4E98F91F-4654-42EF-B908-A3389443F2E7:  0.9661538461538461
File Updated
Shape of df of user no. 52 of id ECECC2AB-D32F-4F90-B74C-E12A1C69BBE2 is: (3530, 280)
Subset Accuracy for ECECC2AB-D32F-4F90-B74C-E12A1C69BBE2:  0.9631728045325779
File Updated
Shape of df of user no. 53 of id B09E373F-8A54-44C8-895B-0039390B859F is: (8134, 280)
Subset Accuracy for B09E373F-8A54-44C8-895B-0039390B859F:  0.9157959434542102
File Updated
Shape of df of user no. 54 of id BE3CA5A6-A561-4BBD-B7C9-5DF6805400FC is: (8309, 280)
Subset Accuracy for BE3CA5A6-A561-4BBD-B7C9-5DF6805400FC:  0.9446450060168472
File Updated
Shape of df of user no. 55 of id 797D145F-3858-4A7F-A7C2-A4EB721E133C is: (3593, 280)
Subset Accuracy for 797D145F-3858-4A7F-A7C2-A4EB721E133C:  0.8887343532684284
File Updated
Shape of df of user no. 56 of id 1DBB0F6F-1F81-4A50-9DF4-CD62ACFA4842 is: (7375, 280)
Subset Accuracy for 1DBB0F6F-1F81-4A50-9DF4-CD62ACFA4842:  0.9010169491525424
File Updated
Shape of df of user no. 57 of id 665514DE-49DC-421F-8DCB-145D0B2609AD is: (9167, 280)
Subset Accuracy for 665514DE-49DC-421F-8DCB-145D0B2609AD:  0.9623773173391494
File Updated
Shape of df of user no. 58 of id 5152A2DF-FAF3-4BA8-9CA9-E66B32671A53 is: (6617, 280)
Subset Accuracy for 5152A2DF-FAF3-4BA8-9CA9-E66B32671A53:  0.9350453172205438
File Updated
Shape of df of user no. 59 of id 0E6184E1-90C0-48EE-B25A-F1ECB7B9714E is: (7521, 280)
Subset Accuracy for 0E6184E1-90C0-48EE-B25A-F1ECB7B9714E:  0.9408637873754153
File Updated
Shape of df of user no. 60 of id 27E04243-B138-4F40-A164-F40B60165CF3 is: (4927, 280)
Subset Accuracy for 27E04243-B138-4F40-A164-F40B60165CF3:  0.9655172413793104
File Updated
In [ ]:
print(f'Shape of df is: {combined_csv_data_4_model.shape}')
# Assuming 'combined_csv_data' is your DataFrame
X = combined_csv_data_4_model[features_to_include]
y = combined_csv_data_4_model[output_columns]
# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Initialize Classifier Chain with a RandomForest base classifier
classifier = ClassifierChain(RandomForestClassifier())
# Train the Classifier Chain model
classifier.fit(X_train, y_train)
models_data['models']["all"] = classifier
    # Make predictions
print('Processing predictions for X_test.')
predictions = classifier.predict(X_test)

def subset_accuracy(y_true, y_pred):
    return (y_true == y_pred).all(axis=1).mean()
print('Processing accuracy.')

accuracy = subset_accuracy(y_test, predictions.toarray())
models_data['accuracies']["all"] = accuracy
print(f"Subset Accuracy for all together data is: ", accuracy)
with open('all_clfs_2.pkl', 'wb') as file:
    pickle.dump(models_data, file)
print('File Updated')
Shape of df is: (377346, 280)
Processing predictions for X_test.
Processing accuracy.
Subset Accuracy for all together data is:  0.78130382933616
File Updated

Using classifier chains to generate predictions for LSTM¶

Let's use the success of our classifier chains for individual users to generate predictions for our LSTM model. First we'll prepare the predictions generated by the classifier chains.

In [ ]:
combined_csv_data_4_model = combined_csv_data.copy()
combined_csv_data_4_model['timestamp_numeric'] = pd.to_datetime(combined_csv_data_4_model['timestamp']).astype(np.int64) // 10**9
combined_csv_data_4_model = combined_csv_data_4_model.drop(columns=['timestamp'])

# Creating user_specific_data dictionary
user_specific_data = {}
for user in users:
    user_df = combined_csv_data_4_model[combined_csv_data_4_model['user_id'] == user]
    
    # Sorting user_df by 'timestamp_numeric' to ensure temporal order
    user_df = user_df.sort_values(by='timestamp_numeric')
    
    user_specific_data[user] = user_df
In [ ]:
# Loading models from disk
with open('clfs_2.pkl', 'rb') as file:
    models_data = pickle.load(file)

# Defining function to generate predictions using classifier chains
def generate_classifier_chain_predictions(user_df, classifier_chain_model):
    # Normalize features
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(user_df[features_to_include])

    # Generate predictions
    predictions = classifier_chain_model.predict(X_scaled)
    
    # Return the predictions as an array
    return predictions.toarray()

user_predictions = {}
for user_id, user_data in user_specific_data.items():
    user_df = user_specific_data[user_id]
    classifier_chain_model = models_data['models'][user_id]  # Get the corresponding model for the user
    
    # Generate classifier chain predictions for the user
    user_predictions[user_id] = generate_classifier_chain_predictions(user_df, classifier_chain_model)

# Now 'user_predictions' contains predictions for each user that can be used as input for the LSTM
In [ ]:
# Iterating through user_predictions and print the shape of each user's predictions
for user_id, predictions in user_predictions.items():
    print(f"User ID: {user_id}, Shape: {np.array(predictions).shape}")

# Checking if all predictions are 2D arrays with a consistent second dimension
consistent_shape = True
second_dim = None

for predictions in user_predictions.values():
    np_predictions = np.array(predictions)
    if second_dim is None:
        second_dim = np_predictions.shape[1] if len(np_predictions.shape) > 1 else 0
    elif len(np_predictions.shape) <= 1 or np_predictions.shape[1] != second_dim:
        consistent_shape = False
        break

if consistent_shape and second_dim:
    print(f"All predictions are 2D arrays with a consistent second dimension: {second_dim}")
else:
    print("Predictions are not consistent 2D arrays or have varying second dimensions.")
User ID: 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0, Shape: (6407, 52)
User ID: 61359772-D8D8-480D-B623-7C636EAD0C81, Shape: (6079, 52)
User ID: 40E170A7-607B-4578-AF04-F021C3B0384A, Shape: (7649, 52)
User ID: 806289BC-AD52-4CC1-806C-0CDB14D65EB6, Shape: (9242, 52)
User ID: 61976C24-1C50-4355-9C49-AAE44A7D09F6, Shape: (8730, 52)
User ID: D7D20E2E-FC78-405D-B346-DBD3FD8FC92B, Shape: (6210, 52)
User ID: 7D9BB102-A612-4E2A-8E22-3159752F55D8, Shape: (1600, 52)
User ID: 5119D0F8-FCA8-4184-A4EB-19421A40DE0D, Shape: (6617, 52)
User ID: 9DC38D04-E82E-4F29-AB52-B476535226F2, Shape: (9686, 52)
User ID: A7599A50-24AE-46A6-8EA6-2576F1011D81, Shape: (3898, 52)
User ID: 59EEFAE0-DEB0-4FFF-9250-54D2A03D0CF2, Shape: (7542, 52)
User ID: 24E40C4C-A349-4F9F-93AB-01D00FB994AF, Shape: (4771, 52)
User ID: 9759096F-1119-4E19-A0AD-6F16989C7E1C, Shape: (9959, 52)
User ID: 1155FF54-63D3-4AB2-9863-8385D0BD0A13, Shape: (2685, 52)
User ID: 96A358A0-FFF2-4239-B93E-C7425B901B47, Shape: (5819, 52)
User ID: 78A91A4E-4A51-4065-BDA7-94755F0BB3BB, Shape: (11996, 52)
User ID: F50235E0-DD67-4F2A-B00B-1F31ADA998B9, Shape: (2266, 52)
User ID: 1538C99F-BA1E-4EFB-A949-6C7C47701B20, Shape: (6549, 52)
User ID: 11B5EC4D-4133-4289-B475-4E737182A406, Shape: (8845, 52)
User ID: 098A72A5-E3E5-4F54-A152-BBDA0DF7B694, Shape: (6813, 52)
User ID: 59818CD2-24D7-4D32-B133-24C2FE3801E5, Shape: (5947, 52)
User ID: 33A85C34-CFE4-4732-9E73-0A7AC861B27A, Shape: (6172, 52)
User ID: 00EABED2-271D-49D8-B599-1D4A09240601, Shape: (2287, 52)
User ID: 136562B6-95B2-483D-88DC-065F28409FD2, Shape: (6218, 52)
User ID: B9724848-C7E2-45F4-9B3F-A1F38D864495, Shape: (7626, 52)
User ID: CF722AA9-2533-4E51-9FEB-9EAC84EE9AAC, Shape: (3615, 52)
User ID: FDAA70A1-42A3-4E3F-9AE3-3FDA412E03BF, Shape: (4973, 52)
User ID: A5CDF89D-02A2-4EC1-89F8-F534FDABDD96, Shape: (6040, 52)
User ID: 0BFC35E2-4817-4865-BFA7-764742302A2D, Shape: (3108, 52)
User ID: BEF6C611-50DA-4971-A040-87FB979F3FC1, Shape: (3451, 52)
User ID: 4FC32141-E888-4BFF-8804-12559A491D8C, Shape: (4979, 52)
User ID: A76A5AF5-5A93-4CF2-A16E-62353BB70E8A, Shape: (7520, 52)
User ID: 3600D531-0C55-44A7-AE95-A7A38519464E, Shape: (5203, 52)
User ID: 2C32C23E-E30C-498A-8DD2-0EFB9150A02E, Shape: (8516, 52)
User ID: 86A4F379-B305-473D-9D83-FC7D800180EF, Shape: (10738, 52)
User ID: 99B204C0-DD5C-4BB7-83E8-A37281B8D769, Shape: (6038, 52)
User ID: 74B86067-5D4B-43CF-82CF-341B76BEA0F4, Shape: (7298, 52)
User ID: 5EF64122-B513-46AE-BCF1-E62AAC285D2C, Shape: (3911, 52)
User ID: B7F9D634-263E-4A97-87F9-6FFB4DDCB36C, Shape: (9383, 52)
User ID: A5A30F76-581E-4757-97A2-957553A2C6AA, Shape: (1667, 52)
User ID: C48CE857-A0DD-4DDB-BEA5-3A25449B2153, Shape: (5092, 52)
User ID: 83CF687B-7CEC-434B-9FE8-00C3D5799BE6, Shape: (9539, 52)
User ID: 0A986513-7828-4D53-AA1F-E02D6DF9561B, Shape: (3960, 52)
User ID: 7CE37510-56D0-4120-A1CF-0E23351428D2, Shape: (9761, 52)
User ID: E65577C1-8D5D-4F70-AF23-B3ADB9D3DBA3, Shape: (3441, 52)
User ID: CCAF77F0-FABB-4F2F-9E24-D56AD0C5A82F, Shape: (8472, 52)
User ID: CA820D43-E5E2-42EF-9798-BE56F776370B, Shape: (7865, 52)
User ID: 8023FE1A-D3B0-4E2C-A57A-9321B7FC755F, Shape: (9189, 52)
User ID: 481F4DD2-7689-43B9-A2AA-C8772227162B, Shape: (6691, 52)
User ID: CDA3BBF7-6631-45E8-85BA-EEB416B32A3C, Shape: (2860, 52)
User ID: 4E98F91F-4654-42EF-B908-A3389443F2E7, Shape: (3250, 52)
User ID: ECECC2AB-D32F-4F90-B74C-E12A1C69BBE2, Shape: (3530, 52)
User ID: B09E373F-8A54-44C8-895B-0039390B859F, Shape: (8134, 52)
User ID: BE3CA5A6-A561-4BBD-B7C9-5DF6805400FC, Shape: (8309, 52)
User ID: 797D145F-3858-4A7F-A7C2-A4EB721E133C, Shape: (3593, 52)
User ID: 1DBB0F6F-1F81-4A50-9DF4-CD62ACFA4842, Shape: (7375, 52)
User ID: 665514DE-49DC-421F-8DCB-145D0B2609AD, Shape: (9167, 52)
User ID: 5152A2DF-FAF3-4BA8-9CA9-E66B32671A53, Shape: (6617, 52)
User ID: 0E6184E1-90C0-48EE-B25A-F1ECB7B9714E, Shape: (7521, 52)
User ID: 27E04243-B138-4F40-A164-F40B60165CF3, Shape: (4927, 52)
All predictions are 2D arrays with a consistent second dimension: 52
In [ ]:
# Determine the length of the longest sequence
max_sequence_length = max([len(predictions) for predictions in user_predictions.values()])

# Pad sequences to have the same length and stack them
X_lstm = pad_sequences(list(user_predictions.values()), maxlen=max_sequence_length, padding='post', dtype='float64')

# Since we are using padding, we need to keep track of the original lengths of each user's predictions
# This will be useful when interpreting the model's predictions
original_lengths = [len(predictions) for predictions in user_predictions.values()]

# Reshape the data to add a feature dimension (required by Conv1D layers, if used)
X_lstm = X_lstm.reshape((X_lstm.shape[0], X_lstm.shape[1], 52))

print(f"LSTM input shape: {X_lstm.shape}")
LSTM input shape: (60, 11996, 52)
In [ ]:
# Creating user_specific_timestamps dictionary
user_specific_timestamps = {}
for user in users:
    # Retrieve the user's dataframe including the timestamp
    user_df = combined_csv_data[combined_csv_data['user_id'] == user]
    
    # Sort user_df by 'timestamp' to ensure temporal order
    user_df = user_df.sort_values(by='timestamp')
    
    # Extract and store the timestamps for the user
    user_specific_timestamps[user] = user_df['timestamp'].values

# Now 'user_specific_timestamps' contains the ordered timestamps for each user
In [ ]:
# Pad the timestamps to have the same length as the sequences
padded_timestamps = pad_sequences(list(user_specific_timestamps.values()), maxlen=max_sequence_length, padding='post', value=None, dtype='float64')  # Use "NONE" as a placeholder for non-real timestamps

# Keep track of the original lengths to filter out the padded timestamps later
original_timestamp_lengths = [len(ts) for ts in user_specific_timestamps.values()]

Now let's prepare the LSTM model

In [ ]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import Adam

model = Sequential([
    LSTM(50, input_shape=(X_lstm.shape[1], X_lstm.shape[2]), return_sequences=True),
    Dropout(0.5),
    LSTM(50, return_sequences=False),
    Dropout(0.5),
    Dense(100, activation='relu'),
    Dense(len(label_columns), activation='sigmoid')
])

model.compile(optimizer=Adam(learning_rate=0.001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.summary()
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_2 (LSTM)               (None, 11996, 50)         20600     
                                                                 
 dropout_2 (Dropout)         (None, 11996, 50)         0         
                                                                 
 lstm_3 (LSTM)               (None, 50)                20200     
                                                                 
 dropout_3 (Dropout)         (None, 50)                0         
                                                                 
 dense_2 (Dense)             (None, 100)               5100      
                                                                 
 dense_3 (Dense)             (None, 52)                5252      
                                                                 
=================================================================
Total params: 51152 (199.81 KB)
Trainable params: 51152 (199.81 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [ ]:
user_labels = {}
for user_id in users:  # Assuming 'users' is a list of all user IDs
    # Extract labels for the current user
    # This assumes you have a way to select labels for each user similar to how you selected their data
    user_labels_df = combined_csv_data[combined_csv_data['user_id'] == user_id]
    
    # Adjust the column selection as necessary to match your actual label columns
    labels_array = user_labels_df[label_columns].values 
    
    user_labels[user_id] = labels_array
In [ ]:
for user_id, labels in user_labels.items():
    print(f"User ID: {user_id}, Labels shape: {labels.shape}")
User ID: 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0, Labels shape: (6407, 52)
User ID: 61359772-D8D8-480D-B623-7C636EAD0C81, Labels shape: (6079, 52)
User ID: 40E170A7-607B-4578-AF04-F021C3B0384A, Labels shape: (7649, 52)
User ID: 806289BC-AD52-4CC1-806C-0CDB14D65EB6, Labels shape: (9242, 52)
User ID: 61976C24-1C50-4355-9C49-AAE44A7D09F6, Labels shape: (8730, 52)
User ID: D7D20E2E-FC78-405D-B346-DBD3FD8FC92B, Labels shape: (6210, 52)
User ID: 7D9BB102-A612-4E2A-8E22-3159752F55D8, Labels shape: (1600, 52)
User ID: 5119D0F8-FCA8-4184-A4EB-19421A40DE0D, Labels shape: (6617, 52)
User ID: 9DC38D04-E82E-4F29-AB52-B476535226F2, Labels shape: (9686, 52)
User ID: A7599A50-24AE-46A6-8EA6-2576F1011D81, Labels shape: (3898, 52)
User ID: 59EEFAE0-DEB0-4FFF-9250-54D2A03D0CF2, Labels shape: (7542, 52)
User ID: 24E40C4C-A349-4F9F-93AB-01D00FB994AF, Labels shape: (4771, 52)
User ID: 9759096F-1119-4E19-A0AD-6F16989C7E1C, Labels shape: (9959, 52)
User ID: 1155FF54-63D3-4AB2-9863-8385D0BD0A13, Labels shape: (2685, 52)
User ID: 96A358A0-FFF2-4239-B93E-C7425B901B47, Labels shape: (5819, 52)
User ID: 78A91A4E-4A51-4065-BDA7-94755F0BB3BB, Labels shape: (11996, 52)
User ID: F50235E0-DD67-4F2A-B00B-1F31ADA998B9, Labels shape: (2266, 52)
User ID: 1538C99F-BA1E-4EFB-A949-6C7C47701B20, Labels shape: (6549, 52)
User ID: 11B5EC4D-4133-4289-B475-4E737182A406, Labels shape: (8845, 52)
User ID: 098A72A5-E3E5-4F54-A152-BBDA0DF7B694, Labels shape: (6813, 52)
User ID: 59818CD2-24D7-4D32-B133-24C2FE3801E5, Labels shape: (5947, 52)
User ID: 33A85C34-CFE4-4732-9E73-0A7AC861B27A, Labels shape: (6172, 52)
User ID: 00EABED2-271D-49D8-B599-1D4A09240601, Labels shape: (2287, 52)
User ID: 136562B6-95B2-483D-88DC-065F28409FD2, Labels shape: (6218, 52)
User ID: B9724848-C7E2-45F4-9B3F-A1F38D864495, Labels shape: (7626, 52)
User ID: CF722AA9-2533-4E51-9FEB-9EAC84EE9AAC, Labels shape: (3615, 52)
User ID: FDAA70A1-42A3-4E3F-9AE3-3FDA412E03BF, Labels shape: (4973, 52)
User ID: A5CDF89D-02A2-4EC1-89F8-F534FDABDD96, Labels shape: (6040, 52)
User ID: 0BFC35E2-4817-4865-BFA7-764742302A2D, Labels shape: (3108, 52)
User ID: BEF6C611-50DA-4971-A040-87FB979F3FC1, Labels shape: (3451, 52)
User ID: 4FC32141-E888-4BFF-8804-12559A491D8C, Labels shape: (4979, 52)
User ID: A76A5AF5-5A93-4CF2-A16E-62353BB70E8A, Labels shape: (7520, 52)
User ID: 3600D531-0C55-44A7-AE95-A7A38519464E, Labels shape: (5203, 52)
User ID: 2C32C23E-E30C-498A-8DD2-0EFB9150A02E, Labels shape: (8516, 52)
User ID: 86A4F379-B305-473D-9D83-FC7D800180EF, Labels shape: (10738, 52)
User ID: 99B204C0-DD5C-4BB7-83E8-A37281B8D769, Labels shape: (6038, 52)
User ID: 74B86067-5D4B-43CF-82CF-341B76BEA0F4, Labels shape: (7298, 52)
User ID: 5EF64122-B513-46AE-BCF1-E62AAC285D2C, Labels shape: (3911, 52)
User ID: B7F9D634-263E-4A97-87F9-6FFB4DDCB36C, Labels shape: (9383, 52)
User ID: A5A30F76-581E-4757-97A2-957553A2C6AA, Labels shape: (1667, 52)
User ID: C48CE857-A0DD-4DDB-BEA5-3A25449B2153, Labels shape: (5092, 52)
User ID: 83CF687B-7CEC-434B-9FE8-00C3D5799BE6, Labels shape: (9539, 52)
User ID: 0A986513-7828-4D53-AA1F-E02D6DF9561B, Labels shape: (3960, 52)
User ID: 7CE37510-56D0-4120-A1CF-0E23351428D2, Labels shape: (9761, 52)
User ID: E65577C1-8D5D-4F70-AF23-B3ADB9D3DBA3, Labels shape: (3441, 52)
User ID: CCAF77F0-FABB-4F2F-9E24-D56AD0C5A82F, Labels shape: (8472, 52)
User ID: CA820D43-E5E2-42EF-9798-BE56F776370B, Labels shape: (7865, 52)
User ID: 8023FE1A-D3B0-4E2C-A57A-9321B7FC755F, Labels shape: (9189, 52)
User ID: 481F4DD2-7689-43B9-A2AA-C8772227162B, Labels shape: (6691, 52)
User ID: CDA3BBF7-6631-45E8-85BA-EEB416B32A3C, Labels shape: (2860, 52)
User ID: 4E98F91F-4654-42EF-B908-A3389443F2E7, Labels shape: (3250, 52)
User ID: ECECC2AB-D32F-4F90-B74C-E12A1C69BBE2, Labels shape: (3530, 52)
User ID: B09E373F-8A54-44C8-895B-0039390B859F, Labels shape: (8134, 52)
User ID: BE3CA5A6-A561-4BBD-B7C9-5DF6805400FC, Labels shape: (8309, 52)
User ID: 797D145F-3858-4A7F-A7C2-A4EB721E133C, Labels shape: (3593, 52)
User ID: 1DBB0F6F-1F81-4A50-9DF4-CD62ACFA4842, Labels shape: (7375, 52)
User ID: 665514DE-49DC-421F-8DCB-145D0B2609AD, Labels shape: (9167, 52)
User ID: 5152A2DF-FAF3-4BA8-9CA9-E66B32671A53, Labels shape: (6617, 52)
User ID: 0E6184E1-90C0-48EE-B25A-F1ECB7B9714E, Labels shape: (7521, 52)
User ID: 27E04243-B138-4F40-A164-F40B60165CF3, Labels shape: (4927, 52)
In [ ]:
padded_labels = []

# Loop over each user's labels
for user_id, labels in user_labels.items():
    # Pad the user's label array to have the same length as the max_sequence_length
    # We use the same 'post' padding to align with the input sequences
    padded_label = pad_sequences([labels], maxlen=max_sequence_length, padding='post', dtype='float64')[0]
    padded_labels.append(padded_label)

# Convert the list of padded label arrays into a single NumPy array
y_lstm = np.array(padded_labels)

print(f"Padded labels shape: {y_lstm.shape}")
Padded labels shape: (60, 11996, 52)
In [ ]:
# Let's assume each sequence should map to a single set of labels not a timestamp
# we'll reduce the dimensionality of y_lstm to just two dimensions: (number of samples, number of labels)
y_lstm = y_lstm[:, 0, :] 

print(f"Adjusted labels shape: {y_lstm.shape}")
Adjusted labels shape: (60, 52)
In [ ]:
X_train, X_val, y_train, y_val = train_test_split(X_lstm, y_lstm, test_size=0.2, random_state=42)
In [ ]:
# Training model
history = model.fit(
    X_train, 
    y_train, 
    epochs=10, 
    batch_size=64, 
    validation_data=(X_val, y_val),
    verbose=1
)

# Evaluating model
val_loss, val_acc = model.evaluate(X_val, y_val, verbose=0)

print(f'Validation accuracy: {val_acc}, Validation loss: {val_loss}')
Epoch 1/10
1/1 [==============================] - 23s 23s/step - loss: 0.6930 - accuracy: 0.0000e+00 - val_loss: 0.6921 - val_accuracy: 0.0000e+00
Epoch 2/10
1/1 [==============================] - 15s 15s/step - loss: 0.6920 - accuracy: 0.0417 - val_loss: 0.6908 - val_accuracy: 0.0000e+00
Epoch 3/10
1/1 [==============================] - 13s 13s/step - loss: 0.6907 - accuracy: 0.0417 - val_loss: 0.6892 - val_accuracy: 0.0000e+00
Epoch 4/10
1/1 [==============================] - 15s 15s/step - loss: 0.6892 - accuracy: 0.0625 - val_loss: 0.6873 - val_accuracy: 0.0000e+00
Epoch 5/10
1/1 [==============================] - 15s 15s/step - loss: 0.6868 - accuracy: 0.0417 - val_loss: 0.6850 - val_accuracy: 0.0000e+00
Epoch 6/10
1/1 [==============================] - 13s 13s/step - loss: 0.6844 - accuracy: 0.0417 - val_loss: 0.6820 - val_accuracy: 0.0000e+00
Epoch 7/10
1/1 [==============================] - 13s 13s/step - loss: 0.6815 - accuracy: 0.0000e+00 - val_loss: 0.6782 - val_accuracy: 0.0000e+00
Epoch 8/10
1/1 [==============================] - 13s 13s/step - loss: 0.6773 - accuracy: 0.0000e+00 - val_loss: 0.6733 - val_accuracy: 0.0000e+00
Epoch 9/10
1/1 [==============================] - 13s 13s/step - loss: 0.6728 - accuracy: 0.0208 - val_loss: 0.6669 - val_accuracy: 0.0000e+00
Epoch 10/10
1/1 [==============================] - 13s 13s/step - loss: 0.6649 - accuracy: 0.0000e+00 - val_loss: 0.6582 - val_accuracy: 0.0000e+00
Validation accuracy: 0.0, Validation loss: 0.6582168936729431
In [ ]:
# Examining predictions from model 
# Generating predictions for the validation set
predictions = model.predict(X_val)

# Applying threshold to convert propabilities to binary values
binary_predictions = (predictions > 0.5).astype(int)

# Lets examine a few predictions to get a sense of what our model is doing 
for i, prediction in enumerate(binary_predictions[:5]):
    print(f"Prediction for sample {i}: {prediction}")
1/1 [==============================] - 1s 1s/step
Prediction for sample 0: [0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
Prediction for sample 1: [0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
Prediction for sample 2: [0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
Prediction for sample 3: [0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
Prediction for sample 4: [0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
In [ ]:
# Lets map these to the label names so we know what the predictions mean 
for i, prediction in enumerate(binary_predictions[:5]):
    labeled_prediction = dict(zip(label_columns, prediction))
    print(f"Prediction for sample {i}: {labeled_prediction}")
Prediction for sample 0: {'label:LYING_DOWN': 0, 'label:SITTING': 0, 'label:FIX_walking': 0, 'label:FIX_running': 0, 'label:BICYCLING': 0, 'label:SLEEPING': 0, 'label:LAB_WORK': 0, 'label:IN_CLASS': 0, 'label:IN_A_MEETING': 1, 'label:LOC_main_workplace': 0, 'label:OR_indoors': 0, 'label:OR_outside': 1, 'label:IN_A_CAR': 0, 'label:ON_A_BUS': 0, 'label:DRIVE_-_I_M_THE_DRIVER': 0, 'label:DRIVE_-_I_M_A_PASSENGER': 0, 'label:LOC_home': 0, 'label:FIX_restaurant': 0, 'label:PHONE_IN_POCKET': 0, 'label:OR_exercise': 0, 'label:COOKING': 0, 'label:SHOPPING': 0, 'label:STROLLING': 1, 'label:DRINKING__ALCOHOL_': 0, 'label:BATHING_-_SHOWER': 0, 'label:CLEANING': 0, 'label:DOING_LAUNDRY': 0, 'label:WASHING_DISHES': 0, 'label:WATCHING_TV': 0, 'label:SURFING_THE_INTERNET': 0, 'label:AT_A_PARTY': 0, 'label:AT_A_BAR': 0, 'label:LOC_beach': 0, 'label:SINGING': 0, 'label:TALKING': 0, 'label:COMPUTER_WORK': 0, 'label:EATING': 0, 'label:TOILET': 0, 'label:GROOMING': 1, 'label:DRESSING': 0, 'label:AT_THE_GYM': 0, 'label:STAIRS_-_GOING_UP': 0, 'label:STAIRS_-_GOING_DOWN': 0, 'label:ELEVATOR': 0, 'label:OR_standing': 0, 'label:AT_SCHOOL': 0, 'label:PHONE_IN_HAND': 0, 'label:PHONE_IN_BAG': 0, 'label:PHONE_ON_TABLE': 0, 'label:WITH_CO-WORKERS': 0, 'label:WITH_FRIENDS': 0, 'label:UNKNOWN': 0}
Prediction for sample 1: {'label:LYING_DOWN': 0, 'label:SITTING': 0, 'label:FIX_walking': 0, 'label:FIX_running': 0, 'label:BICYCLING': 0, 'label:SLEEPING': 0, 'label:LAB_WORK': 0, 'label:IN_CLASS': 0, 'label:IN_A_MEETING': 1, 'label:LOC_main_workplace': 0, 'label:OR_indoors': 0, 'label:OR_outside': 1, 'label:IN_A_CAR': 0, 'label:ON_A_BUS': 0, 'label:DRIVE_-_I_M_THE_DRIVER': 0, 'label:DRIVE_-_I_M_A_PASSENGER': 0, 'label:LOC_home': 0, 'label:FIX_restaurant': 0, 'label:PHONE_IN_POCKET': 0, 'label:OR_exercise': 0, 'label:COOKING': 0, 'label:SHOPPING': 0, 'label:STROLLING': 1, 'label:DRINKING__ALCOHOL_': 0, 'label:BATHING_-_SHOWER': 0, 'label:CLEANING': 0, 'label:DOING_LAUNDRY': 0, 'label:WASHING_DISHES': 0, 'label:WATCHING_TV': 0, 'label:SURFING_THE_INTERNET': 0, 'label:AT_A_PARTY': 0, 'label:AT_A_BAR': 0, 'label:LOC_beach': 0, 'label:SINGING': 0, 'label:TALKING': 0, 'label:COMPUTER_WORK': 0, 'label:EATING': 0, 'label:TOILET': 0, 'label:GROOMING': 1, 'label:DRESSING': 0, 'label:AT_THE_GYM': 0, 'label:STAIRS_-_GOING_UP': 0, 'label:STAIRS_-_GOING_DOWN': 0, 'label:ELEVATOR': 0, 'label:OR_standing': 0, 'label:AT_SCHOOL': 0, 'label:PHONE_IN_HAND': 0, 'label:PHONE_IN_BAG': 0, 'label:PHONE_ON_TABLE': 0, 'label:WITH_CO-WORKERS': 0, 'label:WITH_FRIENDS': 0, 'label:UNKNOWN': 0}
Prediction for sample 2: {'label:LYING_DOWN': 0, 'label:SITTING': 0, 'label:FIX_walking': 0, 'label:FIX_running': 0, 'label:BICYCLING': 0, 'label:SLEEPING': 0, 'label:LAB_WORK': 0, 'label:IN_CLASS': 0, 'label:IN_A_MEETING': 1, 'label:LOC_main_workplace': 0, 'label:OR_indoors': 0, 'label:OR_outside': 1, 'label:IN_A_CAR': 0, 'label:ON_A_BUS': 0, 'label:DRIVE_-_I_M_THE_DRIVER': 0, 'label:DRIVE_-_I_M_A_PASSENGER': 0, 'label:LOC_home': 0, 'label:FIX_restaurant': 0, 'label:PHONE_IN_POCKET': 0, 'label:OR_exercise': 0, 'label:COOKING': 0, 'label:SHOPPING': 0, 'label:STROLLING': 1, 'label:DRINKING__ALCOHOL_': 0, 'label:BATHING_-_SHOWER': 0, 'label:CLEANING': 0, 'label:DOING_LAUNDRY': 0, 'label:WASHING_DISHES': 0, 'label:WATCHING_TV': 0, 'label:SURFING_THE_INTERNET': 0, 'label:AT_A_PARTY': 0, 'label:AT_A_BAR': 0, 'label:LOC_beach': 0, 'label:SINGING': 0, 'label:TALKING': 0, 'label:COMPUTER_WORK': 0, 'label:EATING': 0, 'label:TOILET': 0, 'label:GROOMING': 1, 'label:DRESSING': 0, 'label:AT_THE_GYM': 0, 'label:STAIRS_-_GOING_UP': 0, 'label:STAIRS_-_GOING_DOWN': 0, 'label:ELEVATOR': 0, 'label:OR_standing': 0, 'label:AT_SCHOOL': 0, 'label:PHONE_IN_HAND': 0, 'label:PHONE_IN_BAG': 0, 'label:PHONE_ON_TABLE': 0, 'label:WITH_CO-WORKERS': 0, 'label:WITH_FRIENDS': 0, 'label:UNKNOWN': 0}
Prediction for sample 3: {'label:LYING_DOWN': 0, 'label:SITTING': 0, 'label:FIX_walking': 0, 'label:FIX_running': 0, 'label:BICYCLING': 0, 'label:SLEEPING': 0, 'label:LAB_WORK': 0, 'label:IN_CLASS': 0, 'label:IN_A_MEETING': 1, 'label:LOC_main_workplace': 0, 'label:OR_indoors': 0, 'label:OR_outside': 1, 'label:IN_A_CAR': 0, 'label:ON_A_BUS': 0, 'label:DRIVE_-_I_M_THE_DRIVER': 0, 'label:DRIVE_-_I_M_A_PASSENGER': 0, 'label:LOC_home': 0, 'label:FIX_restaurant': 0, 'label:PHONE_IN_POCKET': 0, 'label:OR_exercise': 0, 'label:COOKING': 0, 'label:SHOPPING': 0, 'label:STROLLING': 1, 'label:DRINKING__ALCOHOL_': 0, 'label:BATHING_-_SHOWER': 0, 'label:CLEANING': 0, 'label:DOING_LAUNDRY': 0, 'label:WASHING_DISHES': 0, 'label:WATCHING_TV': 0, 'label:SURFING_THE_INTERNET': 0, 'label:AT_A_PARTY': 0, 'label:AT_A_BAR': 0, 'label:LOC_beach': 0, 'label:SINGING': 0, 'label:TALKING': 0, 'label:COMPUTER_WORK': 0, 'label:EATING': 0, 'label:TOILET': 0, 'label:GROOMING': 1, 'label:DRESSING': 0, 'label:AT_THE_GYM': 0, 'label:STAIRS_-_GOING_UP': 0, 'label:STAIRS_-_GOING_DOWN': 0, 'label:ELEVATOR': 0, 'label:OR_standing': 0, 'label:AT_SCHOOL': 0, 'label:PHONE_IN_HAND': 0, 'label:PHONE_IN_BAG': 0, 'label:PHONE_ON_TABLE': 0, 'label:WITH_CO-WORKERS': 0, 'label:WITH_FRIENDS': 0, 'label:UNKNOWN': 0}
Prediction for sample 4: {'label:LYING_DOWN': 0, 'label:SITTING': 0, 'label:FIX_walking': 0, 'label:FIX_running': 0, 'label:BICYCLING': 0, 'label:SLEEPING': 0, 'label:LAB_WORK': 0, 'label:IN_CLASS': 0, 'label:IN_A_MEETING': 1, 'label:LOC_main_workplace': 0, 'label:OR_indoors': 0, 'label:OR_outside': 1, 'label:IN_A_CAR': 0, 'label:ON_A_BUS': 0, 'label:DRIVE_-_I_M_THE_DRIVER': 0, 'label:DRIVE_-_I_M_A_PASSENGER': 0, 'label:LOC_home': 0, 'label:FIX_restaurant': 0, 'label:PHONE_IN_POCKET': 0, 'label:OR_exercise': 0, 'label:COOKING': 0, 'label:SHOPPING': 0, 'label:STROLLING': 1, 'label:DRINKING__ALCOHOL_': 0, 'label:BATHING_-_SHOWER': 0, 'label:CLEANING': 0, 'label:DOING_LAUNDRY': 0, 'label:WASHING_DISHES': 0, 'label:WATCHING_TV': 0, 'label:SURFING_THE_INTERNET': 0, 'label:AT_A_PARTY': 0, 'label:AT_A_BAR': 0, 'label:LOC_beach': 0, 'label:SINGING': 0, 'label:TALKING': 0, 'label:COMPUTER_WORK': 0, 'label:EATING': 0, 'label:TOILET': 0, 'label:GROOMING': 1, 'label:DRESSING': 0, 'label:AT_THE_GYM': 0, 'label:STAIRS_-_GOING_UP': 0, 'label:STAIRS_-_GOING_DOWN': 0, 'label:ELEVATOR': 0, 'label:OR_standing': 0, 'label:AT_SCHOOL': 0, 'label:PHONE_IN_HAND': 0, 'label:PHONE_IN_BAG': 0, 'label:PHONE_ON_TABLE': 0, 'label:WITH_CO-WORKERS': 0, 'label:WITH_FRIENDS': 0, 'label:UNKNOWN': 0}
In [ ]:
# Load all predictions into a dataframe
predictions_df = pd.DataFrame(binary_predictions, columns=label_columns)

# Save the DataFrame to a CSV file with label names as column headers
predictions_df.to_csv('LSTM_model_predictions_with_labels.csv', index=False)
In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

X = y_lstm 
y = X_lstm.reshape(X_lstm.shape[0], -1) 

# Splitting the dataset for training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Training the linear model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

# Making predictions - predicting features based on labels
y_pred = linear_model.predict(X_test)

# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
Mean Squared Error: 0.04169533805008713
In [ ]:
predictions_df = pd.DataFrame(y_pred, columns=[f'Feature_{i}' for i in range(y_pred.shape[1])])
predictions_df.to_csv('predicted_features.csv', index=False)
print("Predictions saved to 'predicted_features.csv'.")
Predictions saved to 'predicted_features.csv'.

Use threshold of 0 to get all next X predictions

END OF NEW LSTM PREDICTION¶

Old numbers Shape of df is: (377346, 280) Processing predictions for X_test. Processing accuracy. Subset Accuracy for all together data is: 0.40275606201139524 File Updated

In [ ]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd
import pickle

# Define your MLP model
class MultiTaskMLP(nn.Module):
    def __init__(self, input_size, output_size):
        super(MultiTaskMLP, self).__init__()
        self.layer1 = nn.Linear(input_size, 64)
        self.layer2 = nn.Linear(64, 64)
        self.output_layer = nn.Linear(64, output_size)
    
    def forward(self, x):
        x = torch.relu(self.layer1(x))
        x = torch.relu(self.layer2(x))
        x = self.output_layer(x)
        return x

# Placeholder for models and accuracies
models_data_2 = {
    'models': {},
    'accuracies': {}
}

# Loop through each user
for user in users:
    user_df = combined_csv_data_4_model[combined_csv_data_4_model['user_id'] == user]
    X = user_df[features_to_include].values
    y = user_df[output_columns].values
    
    # Normalize features
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    # Split the dataset
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
    
    # Convert to PyTorch tensors
    X_train_tensor = torch.FloatTensor(X_train)
    y_train_tensor = torch.FloatTensor(y_train)
    X_test_tensor = torch.FloatTensor(X_test)
    y_test_tensor = torch.FloatTensor(y_test)
    
    # DataLoader
    train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
    train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
    
    test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
    test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
    
    # Initialize the model
    model = MultiTaskMLP(input_size=X_train_tensor.shape[1], output_size=y_train_tensor.shape[1])
    criterion = nn.BCEWithLogitsLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    # Training loop
    for epoch in range(10):  # Adjust epochs as needed
        model.train()
        for inputs, labels in train_loader:
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
    
    # Evaluation
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in test_loader:
            outputs = model(inputs)
            predicted = torch.sigmoid(outputs) > 0.5  # Threshold at 0.5
            total += labels.size(0)
            correct += (predicted == labels).float().mean()
    accuracy = correct / total
    
    # Store the model and accuracy
    models_data_2['models'][user] = model.state_dict()  # Store state dict for minimal size
    models_data_2['accuracies'][user] = accuracy.item()
    print(f"User {user}: Accuracy = {accuracy.item():.4f}")

# Save the models and accuracies
with open('mlp_models.pkl', 'wb') as file:
    pickle.dump(models_data_2, file)
User 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0: Accuracy = 0.0161
User 61359772-D8D8-480D-B623-7C636EAD0C81: Accuracy = 0.0151
User 40E170A7-607B-4578-AF04-F021C3B0384A: Accuracy = 0.0154
User 806289BC-AD52-4CC1-806C-0CDB14D65EB6: Accuracy = 0.0150
User 61976C24-1C50-4355-9C49-AAE44A7D09F6: Accuracy = 0.0155
User D7D20E2E-FC78-405D-B346-DBD3FD8FC92B: Accuracy = 0.0156
User 7D9BB102-A612-4E2A-8E22-3159752F55D8: Accuracy = 0.0149
User 5119D0F8-FCA8-4184-A4EB-19421A40DE0D: Accuracy = 0.0153
User 9DC38D04-E82E-4F29-AB52-B476535226F2: Accuracy = 0.0153
User A7599A50-24AE-46A6-8EA6-2576F1011D81: Accuracy = 0.0161
User 59EEFAE0-DEB0-4FFF-9250-54D2A03D0CF2: Accuracy = 0.0154
User 24E40C4C-A349-4F9F-93AB-01D00FB994AF: Accuracy = 0.0151
User 9759096F-1119-4E19-A0AD-6F16989C7E1C: Accuracy = 0.0157
User 1155FF54-63D3-4AB2-9863-8385D0BD0A13: Accuracy = 0.0162
User 96A358A0-FFF2-4239-B93E-C7425B901B47: Accuracy = 0.0160
User 78A91A4E-4A51-4065-BDA7-94755F0BB3BB: Accuracy = 0.0156
User F50235E0-DD67-4F2A-B00B-1F31ADA998B9: Accuracy = 0.0171
User 1538C99F-BA1E-4EFB-A949-6C7C47701B20: Accuracy = 0.0156
User 11B5EC4D-4133-4289-B475-4E737182A406: Accuracy = 0.0154
User 098A72A5-E3E5-4F54-A152-BBDA0DF7B694: Accuracy = 0.0158
User 59818CD2-24D7-4D32-B133-24C2FE3801E5: Accuracy = 0.0157
User 33A85C34-CFE4-4732-9E73-0A7AC861B27A: Accuracy = 0.0157
User 00EABED2-271D-49D8-B599-1D4A09240601: Accuracy = 0.0170
User 136562B6-95B2-483D-88DC-065F28409FD2: Accuracy = 0.0156
User B9724848-C7E2-45F4-9B3F-A1F38D864495: Accuracy = 0.0152
User CF722AA9-2533-4E51-9FEB-9EAC84EE9AAC: Accuracy = 0.0156
User FDAA70A1-42A3-4E3F-9AE3-3FDA412E03BF: Accuracy = 0.0158
User A5CDF89D-02A2-4EC1-89F8-F534FDABDD96: Accuracy = 0.0152
User 0BFC35E2-4817-4865-BFA7-764742302A2D: Accuracy = 0.0156
User BEF6C611-50DA-4971-A040-87FB979F3FC1: Accuracy = 0.0156
User 4FC32141-E888-4BFF-8804-12559A491D8C: Accuracy = 0.0156
User A76A5AF5-5A93-4CF2-A16E-62353BB70E8A: Accuracy = 0.0155
User 3600D531-0C55-44A7-AE95-A7A38519464E: Accuracy = 0.0158
User 2C32C23E-E30C-498A-8DD2-0EFB9150A02E: Accuracy = 0.0153
User 86A4F379-B305-473D-9D83-FC7D800180EF: Accuracy = 0.0156
User 99B204C0-DD5C-4BB7-83E8-A37281B8D769: Accuracy = 0.0153
User 74B86067-5D4B-43CF-82CF-341B76BEA0F4: Accuracy = 0.0154
User 5EF64122-B513-46AE-BCF1-E62AAC285D2C: Accuracy = 0.0161
User B7F9D634-263E-4A97-87F9-6FFB4DDCB36C: Accuracy = 0.0156
User A5A30F76-581E-4757-97A2-957553A2C6AA: Accuracy = 0.0174
User C48CE857-A0DD-4DDB-BEA5-3A25449B2153: Accuracy = 0.0151
User 83CF687B-7CEC-434B-9FE8-00C3D5799BE6: Accuracy = 0.0153
User 0A986513-7828-4D53-AA1F-E02D6DF9561B: Accuracy = 0.0160
User 7CE37510-56D0-4120-A1CF-0E23351428D2: Accuracy = 0.0153
User E65577C1-8D5D-4F70-AF23-B3ADB9D3DBA3: Accuracy = 0.0156
User CCAF77F0-FABB-4F2F-9E24-D56AD0C5A82F: Accuracy = 0.0156
User CA820D43-E5E2-42EF-9798-BE56F776370B: Accuracy = 0.0155
User 8023FE1A-D3B0-4E2C-A57A-9321B7FC755F: Accuracy = 0.0153
User 481F4DD2-7689-43B9-A2AA-C8772227162B: Accuracy = 0.0152
User CDA3BBF7-6631-45E8-85BA-EEB416B32A3C: Accuracy = 0.0154
User 4E98F91F-4654-42EF-B908-A3389443F2E7: Accuracy = 0.0165
User ECECC2AB-D32F-4F90-B74C-E12A1C69BBE2: Accuracy = 0.0165
User B09E373F-8A54-44C8-895B-0039390B859F: Accuracy = 0.0155
User BE3CA5A6-A561-4BBD-B7C9-5DF6805400FC: Accuracy = 0.0153
User 797D145F-3858-4A7F-A7C2-A4EB721E133C: Accuracy = 0.0163
User 1DBB0F6F-1F81-4A50-9DF4-CD62ACFA4842: Accuracy = 0.0156
User 665514DE-49DC-421F-8DCB-145D0B2609AD: Accuracy = 0.0153
User 5152A2DF-FAF3-4BA8-9CA9-E66B32671A53: Accuracy = 0.0153
User 0E6184E1-90C0-48EE-B25A-F1ECB7B9714E: Accuracy = 0.0154
User 27E04243-B138-4F40-A164-F40B60165CF3: Accuracy = 0.0157
In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import numpy as np

# Assuming 'combined_csv_data_4_model', 'features_to_include', and 'output_columns' are defined

# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(combined_csv_data_4_model[features_to_include])

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, combined_csv_data_4_model[output_columns], test_size=0.2, random_state=42)


# Convert dataset to tensors
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.FloatTensor(y_train.values)  # For multi-label
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.FloatTensor(y_test.values)  # For multi-label


# DataLoader setup
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Model setup
model = nn.Sequential(
    nn.Linear(len(features_to_include), 64),
    nn.BatchNorm1d(64),
    nn.ReLU(),
    nn.Linear(64, 64),
    nn.BatchNorm1d(64),
    nn.ReLU(),
    nn.Linear(64, len(output_columns))
)

# Loss and optimizer setup
criterion = nn.BCEWithLogitsLoss() 
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
model.train()
for epoch in range(10):  # Number of epochs
    for inputs, labels in train_loader:
        lr = 0.0001 * (epoch + 1)
        optimizer = optim.Adam(model.parameters(), lr=lr)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    print(f'Epoch {epoch+1}/10, Loss: {loss.item():.4f}')

# Testing loop
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for inputs, labels in test_loader:
        outputs = model(inputs)
        predicted = torch.sigmoid(outputs) > 0.5  # Applying sigmoid and threshold for multi-label
        total += labels.size(0)
        correct += (predicted == labels.byte()).all(dim=1).sum().item()  # Adjust for multi-label accuracy
accuracy = 100 * correct / total

print(f'Accuracy: {accuracy:.2f}%')
Epoch 1/10, Loss: nan
Epoch 2/10, Loss: nan
Epoch 3/10, Loss: nan
Epoch 4/10, Loss: nan
Epoch 5/10, Loss: nan
Epoch 6/10, Loss: nan
Epoch 7/10, Loss: nan
Epoch 8/10, Loss: nan
Epoch 9/10, Loss: nan
Epoch 10/10, Loss: nan
Accuracy: 0.00%
In [ ]:
subset_accuracy(y_test, predictions.toarray())
Out[ ]:
0.6770670826833073
In [ ]:
testing_df = pd.DataFrame(predictions.toarray())
testing_df.columns = y_test.columns
In [ ]:
print('Predicted    |    Real Value')

cols = y_test.columns
for i in range(1):
    for col in cols:
        print(testing_df.iloc[i][col],  y_test.iloc[i][col])
Predicted    |    Real Value
1.0 1.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
1.0 1.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
1.0 1.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
1.0 1.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0.0
In [ ]:
print(predictions.toarray()[1])
[1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0.]
In [ ]:
print(predictions)
  (0, 0)	1.0
  (4, 0)	1.0
  (5, 0)	1.0
  (6, 0)	1.0
  (7, 0)	1.0
  (8, 0)	1.0
  (12, 0)	1.0
  (14, 0)	1.0
  (21, 0)	1.0
  (23, 0)	1.0
  (27, 0)	1.0
  (30, 0)	1.0
  (36, 0)	1.0
  (41, 0)	1.0
  (43, 0)	1.0
  (45, 0)	1.0
  (52, 0)	1.0
  (56, 0)	1.0
  (57, 0)	1.0
  (58, 0)	1.0
  (62, 0)	1.0
  (69, 0)	1.0
  (72, 0)	1.0
  (76, 0)	1.0
  (79, 0)	1.0
  :	:
  (782, 51)	1.0
  (784, 51)	1.0
  (801, 51)	1.0
  (813, 51)	1.0
  (814, 51)	1.0
  (816, 51)	1.0
  (822, 51)	1.0
  (848, 51)	1.0
  (858, 51)	1.0
  (860, 51)	1.0
  (893, 51)	1.0
  (947, 51)	1.0
  (966, 51)	1.0
  (997, 51)	1.0
  (1019, 51)	1.0
  (1032, 51)	1.0
  (1054, 51)	1.0
  (1090, 51)	1.0
  (1115, 51)	1.0
  (1119, 51)	1.0
  (1137, 51)	1.0
  (1147, 51)	1.0
  (1209, 51)	1.0
  (1225, 51)	1.0
  (1242, 51)	1.0
In [ ]:
y_true.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[46], line 1
----> 1 y_true.head()

NameError: name 'y_true' is not defined
In [ ]:
 
In [ ]:
# Scale features
from sklearn.preprocessing import MinMaxScaler

s1 = MinMaxScaler(feature_range=(-1,1))
user_df_scaled = s1.fit_transform(user_df)
user_df_scaled = pd.DataFrame(user_df_scaled, columns=features_to_include)

# Create sequences
look_back = 4
generator = TimeseriesGenerator(user_df_scaled.values, user_df_scaled.values,
                                length=look_back, batch_size=1)
def create_lstm_model(input_shape, num_features):
    model = Sequential([
        LSTM(units = num_features, activation='relu', input_shape=input_shape, return_sequences=True),
        Dropout(0.2),
        Dense(num_features),
        LSTM(units=num_features, return_sequences=True),
        Dropout(0.2),        
        LSTM(units=num_features, return_sequences=True),
        Dense(num_features),
        Dropout(0.2),
        LSTM(units=num_features, return_sequences=True),
        Dense(num_features),
        Dropout(0.2),        
        LSTM(units=num_features),
        Dense(num_features),
        Activation('linear'),
        
    ])
    model.compile(optimizer='adam', loss='mse', metrics = ['accuracy'])  
    return model
model_path = 'LSTM_next_model.h5'
# Define and compile the LSTM model
model = create_lstm_model((look_back, len(features_to_include)), len(features_to_include))

import keras
if os.path.exists(model_path):
    load_model(model_path)
else:

    model.fit(generator, epochs=1) 
    model.save(model_path)
In [ ]:
def predict_from_df(df, model, features_to_include, look_back=4):
    df_filtered = df[features_to_include]
    
    if len(df_filtered) >= look_back:
        # Extract the last `look_back` rows for the prediction
        last_sequences = df_filtered[-look_back:].values.reshape((1, look_back, len(features_to_include)))
    else:
        raise ValueError(f"DataFrame must have at least {look_back} rows for prediction.")
    
    # Predict the next row using the LSTM model
    predictions = model.predict(last_sequences)
    
    
    return predictions

def compare_predictions_with_actual(df, model, features_to_include, look_back=3):
    predictions = predict_from_df(user_df.iloc[1:5], model, features_to_include, look_back)
    predictions = s1.inverse_transform(predictions)
    actual_values = user_df.iloc[4][features_to_include].values  # Adjust index if needed
    
    return predictions, actual_values

# Assuming 'user_df' is already preprocessed appropriately, including scaling
predictions, actual_values = compare_predictions_with_actual(user_df, model, features_to_include, look_back=4)

# Now you can compare 'predictions' with 'actual_values'
# Note: If your data was scaled, you might need to inverse scale both predictions and actual values before comparison

print(f"{'Predictions':<15}   | {'Actual Values':<15} | {'Diff':<15} | {'Diff %':<15}")
print("-" * 47)  # Adjust the number based on the width of your columns

j = 0
for i in range(len(predictions[j])):
    p = predictions[j][i]
    a = actual_values[i]
    diff = a-p
    diff_p = ((p-a)/a)*100
    print(f"{p:.6f}{'':<9} | {a:<15} | {diff:.6f} | {diff_p:.6f}")
1/1 [==============================] - 1s 859ms/step
Predictions       | Actual Values   | Diff            | Diff %         
-----------------------------------------------
1.087502          | 0.992139        | -0.095363 | 9.611823
0.321921          | 0.008221        | -0.313700 | 3815.839554
0.173169          | -0.010007       | -0.183176 | -1830.479131
0.445987          | 0.017953        | -0.428034 | 2384.189988
0.838229          | 0.988784        | 0.150555 | -15.226278
1.023328          | 0.992739        | -0.030589 | 3.081304
1.255472          | 0.995041        | -0.260431 | 26.172922
1.498911          | 1.481067        | -0.017844 | 1.204784
6.612895          | 6.684577        | 0.071682 | -1.072342
4.992745          | 5.043079        | 0.050334 | -0.998082
1.474535          | 0.000594        | -1.473941 | 248138.173398
2.362392          | 0.00138         | -2.361012 | 171087.822370
1.889040          | 0.000869        | -1.888171 | 217280.896811
2.147585          | 0.013867        | -2.133718 | 15387.018931
0.849614          | 0.429784        | -0.419830 | 97.683986
4.013337          | 0.108995        | -3.904342 | 3582.129142
0.449318          | 0.354559        | -0.094759 | 26.725983
-0.031556          | -0.009355       | 0.022201 | 237.316455
0.014177          | 0.03863         | 0.024453 | -63.300321
-0.017402          | 0.991321        | 1.008723 | -101.755454
0.456519          | 0.004346        | -0.452173 | 10404.353118
0.346918          | 0.004645        | -0.342273 | 7368.640997
0.443765          | 0.008302        | -0.435463 | 5245.273394
-0.000225          | -0.06522        | -0.064995 | -99.654303
-0.031119          | -0.071359       | -0.040240 | -56.390348
0.018305          | -0.345674       | -0.363979 | -105.295327
1.537729          | 0.007862        | -1.529867 | 19459.002626
1.347964          | 0.017157        | -1.330807 | 7756.639107
1.943403          | 0.030883        | -1.912520 | 6192.793713
3.123033          | 0.042611        | -3.080422 | 7229.169862
0.917343          | 0.002787        | -0.914556 | 32815.077117
1.449819          | 0.004164        | -1.445655 | 34717.941594
1.988410          | 0.006372        | -1.982038 | 31105.431815
1.481999          | 0.726105        | -0.755894 | 104.102514
4.577291          | 4.771319        | 0.194028 | -4.066538
3.205980          | 4.166487        | 0.960507 | -23.053172
2.747242          | 3.367551        | 0.620309 | -18.420182
3.032277          | 3.929054        | 0.896777 | -22.824251
2.721468          | 3.223763        | 0.502295 | -15.581009
3.079525          | 4.54848         | 1.468955 | -32.295509
2.694658          | 1.581582        | -1.113076 | 70.377399
4.014255          | 3.15225         | -0.862005 | 27.345707
0.359544          | 0.010827        | -0.348717 | 3220.807795
-0.047386          | 5.2e-05         | 0.047438 | -91227.119958
-0.008391          | 0.000159        | 0.008550 | -5377.508647
0.152547          | -0.000403       | -0.152950 | -37952.876240
1.134649          | 0.018501        | -1.116148 | 6032.904603
1.307168          | 0.002943        | -1.304225 | 44316.166105
1.167371          | 0.002244        | -1.165127 | 51921.892738
0.010832          | -0.478131       | -0.488963 | -102.265437
0.013491          | -0.170299       | -0.183790 | -107.921990
0.010864          | 0.145422        | 0.134558 | -92.529241
224.925583          | 109.779389      | -115.146194 | 104.888718
61.624428          | 0.734185        | -60.890243 | 8293.583061
-26.361868          | 0.494123        | 26.855991 | -5435.082136
77.771667          | 0.986627        | -76.785040 | 7782.580497
221.473618          | 109.246493      | -112.227125 | 102.728354
239.007217          | 109.743838      | -129.263379 | 117.786458
276.287628          | 110.346706      | -165.940922 | 150.381401
1.701550          | 2.463646        | 0.762096 | -30.933674
4.537919          | 5.620379        | 1.082460 | -19.259554
2.681944          | 5.045849        | 2.363905 | -46.848506
2.527126          | 0.000193        | -2.526933 | 1309291.748102
2.538119          | 0.003022        | -2.535097 | 83888.048950
2.616662          | 0.002264        | -2.614398 | 115476.944587
0.230488          | 0.016411        | -0.214077 | 1304.474531
0.463765          | 0.430593        | -0.033172 | 7.703690
4.032182          | 0.8269          | -3.205282 | 387.626341
0.405648          | 0.174487        | -0.231161 | 132.480353
149.449905          | 73.192243       | -76.257662 | 104.188175
-141.063446          | -81.283276      | 59.780170 | 73.545473
-35.202381          | -9.281783       | 25.920598 | 279.263134
48.087212          | 0.75371         | -47.333502 | 6280.068144
57.660343          | 0.752368        | -56.907975 | 7563.848432
77.367630          | 0.832032        | -76.535598 | 9198.636351
-0.024326          | 0.092219        | 0.116545 | -126.378040
0.001552          | -0.130812       | -0.132364 | -101.186448
-0.011629          | 0.170884        | 0.182513 | -106.805483
0.962416          | 0.999895        | 0.037479 | -3.748324
0.494964          | 0.999896        | 0.504932 | -50.498478
0.494380          | 0.999893        | 0.505513 | -50.556734
0.492817          | 0.999876        | 0.507059 | -50.712175
0.355554          | 0.0             | -0.355554 | inf
4.520901          | 0.0             | -4.520901 | inf
0.288122          | 0.008365        | -0.279757 | 3344.372503
0.315624          | 0.013015        | -0.302609 | 2325.079203
73.248192          | 20.282          | -52.966192 | 261.148762
55.153439          | 0.0             | -55.153439 | inf
-0.963250          | -6.907755       | -5.944505 | -86.055523
0.001241          | 3e-06           | -0.001238 | 41281.937141
0.001531          | 4e-06           | -0.001527 | 38170.902587
-0.000485          | 0.0             | 0.000485 | -inf
0.001622          | 0.0             | -0.001622 | inf
0.004261          | 1e-06           | -0.004260 | 426034.839654
0.014713          | 2e-06           | -0.014711 | 735527.999529
1.941853          | 4.847216        | 2.905363 | -59.938792
-0.771159          | -0.530176       | 0.240983 | 45.453384
-0.272603          | 0.410488        | 0.683091 | -166.409487
-0.663139          | -0.51432        | 0.148819 | 28.935073
-0.259132          | -0.736442       | -0.477310 | -64.812984
-0.112731          | -0.275571       | -0.162840 | -59.091770
-0.459598          | -0.885504       | -0.425906 | -48.097627
-0.090833          | -0.415427       | -0.324594 | -78.134912
-0.265383          | -0.461418       | -0.196035 | -42.485325
-0.054731          | -0.208905       | -0.154174 | -73.800832
-0.192168          | -0.374458       | -0.182290 | -48.680912
-0.248281          | -0.445467       | -0.197186 | -44.265087
-0.104653          | -0.291257       | -0.186604 | -64.068498
4.785089          | 1.817652        | -2.967437 | 163.256635
1.070282          | 1.004382        | -0.065900 | 6.561282
0.629550          | 0.576957        | -0.052593 | 9.115581
0.522728          | 0.549768        | 0.027040 | -4.918510
0.405758          | 0.368244        | -0.037514 | 10.187282
0.339536          | 0.237837        | -0.101699 | 42.759762
0.452471          | 0.327434        | -0.125037 | 38.186974
0.304638          | 0.257177        | -0.047461 | 18.454755
0.304042          | 0.247374        | -0.056668 | 22.907636
0.225618          | 0.23933         | 0.013712 | -5.729151
0.227806          | 0.182181        | -0.045625 | 25.043613
0.264394          | 0.155518        | -0.108876 | 70.008626
0.243714          | 0.180659        | -0.063055 | 34.902652
-0.581565          | 8.149602        | 8.731167 | -107.136116
-6.225636          | -8.115568       | -1.889932 | -23.287741
0.498104          | 0.0             | -0.498104 | inf
0.515633          | 0.0             | -0.515633 | inf
0.485093          | 0.0             | -0.485093 | inf
0.491543          | 1.0             | 0.508457 | -50.845724
0.494612          | 1.0             | 0.505388 | -50.538802
0.505099          | 0.0             | -0.505099 | inf
0.503978          | 0.0             | -0.503978 | inf
0.507959          | 0.0             | -0.507959 | inf
0.507657          | 0.0             | -0.507657 | inf
0.491726          | 0.0             | -0.491726 | inf
0.502110          | 0.0             | -0.502110 | inf
0.501529          | 0.0             | -0.501529 | inf
0.492409          | 1.0             | 0.507591 | -50.759065
0.496987          | 0.0             | -0.496987 | inf
0.498913          | 0.0             | -0.498913 | inf
0.487711          | 1.0             | 0.512289 | -51.228896
0.500403          | 0.0             | -0.500403 | inf
0.496930          | 0.0             | -0.496930 | inf
0.504891          | 0.0             | -0.504891 | inf
0.506478          | 0.0             | -0.506478 | inf
0.498209          | 1.0             | 0.501791 | -50.179106
0.498195          | 0.0             | -0.498195 | inf
0.504348          | 0.0             | -0.504348 | inf
0.496296          | 1.0             | 0.503704 | -50.370422
0.506261          | 0.0             | -0.506261 | inf
0.494220          | 0.0             | -0.494220 | inf
0.530882          | 0.16            | -0.370882 | 231.800953
0.491874          | 0.0             | -0.491874 | inf
0.505057          | 0.0             | -0.505057 | inf
0.490880          | 0.0             | -0.490880 | inf
0.508880          | 1.0             | 0.491120 | -49.111956
0.490089          | 1.0             | 0.509911 | -50.991109
0.515844          | 0.0             | -0.515844 | inf
0.503154          | 0.0             | -0.503154 | inf
0.499329          | 0.0             | -0.499329 | inf
1448656384.000000          | 1448316936.0    | -339448.000000 | 0.023437
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_93077/1789851892.py:37: RuntimeWarning: divide by zero encountered in double_scalars
  diff_p = ((p-a)/a)*100

Testing CNN And LSTM for prediction¶

In [ ]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Conv1D, MaxPooling1D, Dropout, Reshape
from tensorflow.keras.optimizers import Adam

# Adjust the input shape according to your dataset
input_shape = (X_train.shape[1], 1)  # Assuming non-sequential data for simplicity

model = Sequential([
    Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=input_shape),
    MaxPooling1D(pool_size=2),
    Conv1D(filters=128, kernel_size=3, activation='relu'),
    MaxPooling1D(pool_size=2),
    # Instead of Flatten, use Reshape or adjust the model so it's suitable for LSTM input
    # Reshape example (adjust the target shape according to your needs):
    # This line is illustrative; actual reshaping depends on the output shape of the previous layer
    Reshape((-1, 128)),  # Adjust the target shape
    LSTM(50, return_sequences=False),  # If you want the LSTM to output a sequence, set return_sequences=True
    Dropout(0.5),
    Dense(100, activation='relu'),
    Dense(len(label_columns), activation='sigmoid')  # Use 'sigmoid' for multi-label classification
])

model.compile(optimizer=Adam(learning_rate=0.001),
              loss='binary_crossentropy',  # Use 'binary_crossentropy' for multi-label classification
              metrics=['accuracy'])

model.summary()
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv1d (Conv1D)             (None, 33, 64)            256       
                                                                 
 max_pooling1d (MaxPooling1  (None, 16, 64)            0         
 D)                                                              
                                                                 
 conv1d_1 (Conv1D)           (None, 14, 128)           24704     
                                                                 
 max_pooling1d_1 (MaxPoolin  (None, 7, 128)            0         
 g1D)                                                            
                                                                 
 reshape (Reshape)           (None, 7, 128)            0         
                                                                 
 lstm (LSTM)                 (None, 50)                35800     
                                                                 
 dropout (Dropout)           (None, 50)                0         
                                                                 
 dense (Dense)               (None, 100)               5100      
                                                                 
 dense_1 (Dense)             (None, 52)                5252      
                                                                 
=================================================================
Total params: 71112 (277.78 KB)
Trainable params: 71112 (277.78 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [ ]:
# Reshape data for CNN if needed
X_train_reshaped = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test_reshaped = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
In [ ]:
# For arrays, assuming X_train_reshaped and y_train are your features and labels respectively
subset_size = 100  # Choose a small size for quick tests
X_train_subset = X_train_reshaped[:subset_size]
y_train_subset = y_train[:subset_size]
In [ ]:
# Testing Shape Issues
model_name = 'first_try.h5'
if os.path.exists(model_name):
    model = load_model(model_name)
else:
    history = model.fit(X_train, y_train,
                    epochs=1,  
                    batch_size=64, 
                    validation_split=0.2, 
                    verbose=1)  

    model.save(model_name)

test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb Cell 51 line 1
      <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=5'>6</a>     history = model.fit(X_train, y_train,
      <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=6'>7</a>                     epochs=1,  
      <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=7'>8</a>                     batch_size=64, 
      <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=8'>9</a>                     validation_split=0.2, 
     <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=9'>10</a>                     verbose=1)  
     <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=11'>12</a>     model.save(model_name)
---> <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=13'>14</a> test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
     <a href='vscode-notebook-cell:/Users/halladaykinsey/AAI-530/ExtraSensory_Combined_User_Data_v0.5.ipynb#Y101sZmlsZQ%3D%3D?line=14'>15</a> print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')

File /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File /var/folders/8j/8zxjcfw125g4mfl6xvm5bl080000gn/T/__autograph_generated_filelqh3eeax.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__test_function(iterator)
     13 try:
     14     do_return = True
---> 15     retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
     16 except:
     17     do_return = False

ValueError: in user code:

    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 2066, in test_function  *
        return step_function(self, iterator)
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 2049, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 2037, in run_step  **
        outputs = model.test_step(data)
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/training.py", line 1917, in test_step
        y_pred = self(x, training=False)
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/engine/input_spec.py", line 298, in assert_input_compatibility
        raise ValueError(

    ValueError: Input 0 of layer "sequential_4" is incompatible with the layer: expected shape=(None, 159, 1), found shape=(None, 35, 1)
In [ ]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Conv1D, MaxPooling1D, Dropout, BatchNormalization, Reshape, Bidirectional
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import LearningRateScheduler
import math

def scheduler(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return lr * math.exp(-0.1)

callback = LearningRateScheduler(scheduler)

model = Sequential([
    Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=input_shape, kernel_regularizer=l2(0.001)),
    BatchNormalization(),
    MaxPooling1D(pool_size=2),
    Conv1D(filters=128, kernel_size=3, activation='relu', kernel_regularizer=l2(0.001)),
    BatchNormalization(),
    MaxPooling1D(pool_size=2),
    Reshape((-1, 128)),  # Adjust based on the output shape of the previous layer
    Bidirectional(LSTM(100, return_sequences=False)),
    Dropout(0.5),
    Dense(100, activation='relu', kernel_regularizer=l2(0.001)),
    BatchNormalization(),
    Dense(len(label_columns), activation='sigmoid')  # Adjust based on your label columns
])

model.compile(optimizer=Adam(learning_rate=0.001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.summary()

model_name = 'second_try.h5'
if os.path.exists(model_name):
    model = load_model(model_name)
else:
    model.fit(X_train, y_train, epochs=1,batch_size=128,  validation_split=0.2)
    model.save(model_name)

test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv1d_14 (Conv1D)          (None, 157, 64)           256       
                                                                 
 batch_normalization_5 (Bat  (None, 157, 64)           256       
 chNormalization)                                                
                                                                 
 max_pooling1d_14 (MaxPooli  (None, 78, 64)            0         
 ng1D)                                                           
                                                                 
 conv1d_15 (Conv1D)          (None, 76, 128)           24704     
                                                                 
 batch_normalization_6 (Bat  (None, 76, 128)           512       
 chNormalization)                                                
                                                                 
 max_pooling1d_15 (MaxPooli  (None, 38, 128)           0         
 ng1D)                                                           
                                                                 
 reshape_4 (Reshape)         (None, 38, 128)           0         
                                                                 
 bidirectional_2 (Bidirecti  (None, 200)               183200    
 onal)                                                           
                                                                 
 dropout_16 (Dropout)        (None, 200)               0         
                                                                 
 dense_22 (Dense)            (None, 100)               20100     
                                                                 
 batch_normalization_7 (Bat  (None, 100)               400       
 chNormalization)                                                
                                                                 
 dense_23 (Dense)            (None, 52)                5252      
                                                                 
=================================================================
Total params: 234680 (916.72 KB)
Trainable params: 234096 (914.44 KB)
Non-trainable params: 584 (2.28 KB)
_________________________________________________________________
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
2359/2359 - 50s - loss: nan - accuracy: 0.0472 - 50s/epoch - 21ms/step
Test accuracy: 0.04721081256866455, Test loss: nan
In [ ]:
# Example fitting with callbacks
# model.fit(X_train, y_train, epochs=1,batch_size=128,  validation_split=0.2)
In [ ]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Conv1D, MaxPooling1D, Dropout, BatchNormalization, Reshape, Bidirectional
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import LearningRateScheduler
import math

def scheduler(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return lr * math.exp(-0.1)

callback = LearningRateScheduler(scheduler)

model = Sequential([
    Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=input_shape, kernel_regularizer=l2(0.001)),
    BatchNormalization(),
    MaxPooling1D(pool_size=2),
    Reshape((-1, 128)),  # Adjust based on the output shape of the previous layer
    Bidirectional(LSTM(100, return_sequences=False)),
    Dropout(0.5),
    Dense(100, activation='relu', kernel_regularizer=l2(0.001)),
    BatchNormalization(),
    Dense(len(label_columns), activation='sigmoid')  # Adjust based on your label columns
])

model.compile(optimizer=Adam(learning_rate=0.001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.summary()
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv1d_4 (Conv1D)           (None, 157, 64)           256       
                                                                 
 batch_normalization_3 (Bat  (None, 157, 64)           256       
 chNormalization)                                                
                                                                 
 max_pooling1d_4 (MaxPoolin  (None, 78, 64)            0         
 g1D)                                                            
                                                                 
 reshape_2 (Reshape)         (None, 39, 128)           0         
                                                                 
 bidirectional_1 (Bidirecti  (None, 200)               183200    
 onal)                                                           
                                                                 
 dropout_6 (Dropout)         (None, 200)               0         
                                                                 
 dense_8 (Dense)             (None, 100)               20100     
                                                                 
 batch_normalization_4 (Bat  (None, 100)               400       
 chNormalization)                                                
                                                                 
 dense_9 (Dense)             (None, 52)                5252      
                                                                 
=================================================================
Total params: 209464 (818.22 KB)
Trainable params: 209136 (816.94 KB)
Non-trainable params: 328 (1.28 KB)
_________________________________________________________________
In [ ]:
model_name = 'third_try.h5'
if os.path.exists(model_name):
    model = load_model(model_name)
else:
    model.fit(X_train, y_train, epochs=1,batch_size=128,  validation_split=0.2)
    model.save(model_name)
    # Evaluate the model
    test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
    print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')

test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
2359/2359 - 41s - loss: nan - accuracy: 0.0472 - 41s/epoch - 17ms/step
Test accuracy: 0.04721081256866455, Test loss: nan
In [ ]:
model_name = 'third_try_2.h5'
if os.path.exists(model_name):
    model = load_model(model_name)
else:
    model.fit(X_train, y_train, epochs=1,batch_size=20,  validation_split=0.2)
    model.save(model_name)
    # Evaluate the model
    test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
    print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
In [ ]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Conv1D, MaxPooling1D, Dropout, Reshape
from tensorflow.keras.optimizers import Adam

# Adjust the input shape according to your dataset
input_shape = (X_train.shape[1], 1)  # Assuming non-sequential data for simplicity

model = Sequential([
    LSTM(len(label_columns), return_sequences=True, input_shape=input_shape),  # If you want the LSTM to output a sequence, set return_sequences=True
    Dropout(0.2),
    Conv1D(filters=len(label_columns), kernel_size=2, activation='relu'),
    MaxPooling1D(pool_size=7),
    Conv1D(filters=128, kernel_size=3, activation='relu'),
    MaxPooling1D(pool_size=2),
    Conv1D(filters=128, kernel_size=5, activation='relu'),
    MaxPooling1D(pool_size=2),
    # Instead of Flatten, use Reshape or adjust the model so it's suitable for LSTM input
    # Reshape example (adjust the target shape according to your needs):
    # This line is illustrative; actual reshaping depends on the output shape of the previous layer
    Reshape((-1, 128)),  # Adjust the target shape
    LSTM(50, return_sequences=False),  # If you want the LSTM to output a sequence, set return_sequences=True
    Dropout(0.2),
    Dense(100, activation='relu'),
    Dense(len(label_columns), activation='sigmoid')  # Use 'sigmoid' for multi-label classification
])

model.compile(optimizer=Adam(learning_rate=0.001),
              loss='binary_crossentropy',  # Use 'binary_crossentropy' for multi-label classification
              metrics=['accuracy'])

model.summary()
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_8 (LSTM)               (None, 159, 52)           11232     
                                                                 
 dropout_7 (Dropout)         (None, 159, 52)           0         
                                                                 
 conv1d_5 (Conv1D)           (None, 158, 52)           5460      
                                                                 
 max_pooling1d_5 (MaxPoolin  (None, 22, 52)            0         
 g1D)                                                            
                                                                 
 conv1d_6 (Conv1D)           (None, 20, 128)           20096     
                                                                 
 max_pooling1d_6 (MaxPoolin  (None, 10, 128)           0         
 g1D)                                                            
                                                                 
 conv1d_7 (Conv1D)           (None, 6, 128)            82048     
                                                                 
 max_pooling1d_7 (MaxPoolin  (None, 3, 128)            0         
 g1D)                                                            
                                                                 
 reshape_3 (Reshape)         (None, 3, 128)            0         
                                                                 
 lstm_9 (LSTM)               (None, 50)                35800     
                                                                 
 dropout_8 (Dropout)         (None, 50)                0         
                                                                 
 dense_10 (Dense)            (None, 100)               5100      
                                                                 
 dense_11 (Dense)            (None, 52)                5252      
                                                                 
=================================================================
Total params: 164988 (644.48 KB)
Trainable params: 164988 (644.48 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [ ]:
model_name = 'fourth_try_2.h5'
if os.path.exists(model_name):
    model = load_model(model_name)
else:
    model.fit(X_train, y_train, epochs=1, validation_split=0.2)
    model.save(model_name)
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
In [ ]:
y_train.shape[1]
Out[ ]:
52
In [ ]:
from keras.models import Sequential
from keras.layers import LSTM, Dropout, Conv1D, MaxPooling1D, Dense, Flatten
from keras.optimizers import Adam

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Adjust the input shape according to your dataset
# For LSTM, input should be in the form of (samples, timesteps, features)
# Assuming each sample is a sequence of vectors
input_shape = (X_train.shape[1], 1)  # Adjust '1' if your data is already in sequences

model = Sequential()
# Start with an LSTM layer to process sequences
model.add(LSTM(units=64, return_sequences=True, input_shape=input_shape))
model.add(Dropout(0.2))

# Followed by CNN layers for feature extraction from sequences processed by LSTM
model.add(Conv1D(filters=64, kernel_size=2, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters=128, kernel_size=3, activation='relu'))
model.add(MaxPooling1D(pool_size=2))

# Flatten the output to feed into a dense layer
model.add(Flatten())
# Additional dense layers or LSTM layers can be added here if needed
# Example: model.add(LSTM(50, return_sequences=False))

model.add(Dense(100, activation='relu'))
model.add(Dense(y_train.shape[1], activation='sigmoid'))  # Assuming 'y' is one-hot encoded for multi-label classification

model.compile(optimizer=Adam(learning_rate=0.001),
              loss='binary_crossentropy',  # Adjust the loss function as per your problem
              metrics=['accuracy'])

model.summary()
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_10 (LSTM)              (None, 159, 64)           16896     
                                                                 
 dropout_9 (Dropout)         (None, 159, 64)           0         
                                                                 
 conv1d_8 (Conv1D)           (None, 158, 64)           8256      
                                                                 
 max_pooling1d_8 (MaxPoolin  (None, 79, 64)            0         
 g1D)                                                            
                                                                 
 conv1d_9 (Conv1D)           (None, 77, 128)           24704     
                                                                 
 max_pooling1d_9 (MaxPoolin  (None, 38, 128)           0         
 g1D)                                                            
                                                                 
 flatten (Flatten)           (None, 4864)              0         
                                                                 
 dense_12 (Dense)            (None, 100)               486500    
                                                                 
 dense_13 (Dense)            (None, 52)                5252      
                                                                 
=================================================================
Total params: 541608 (2.07 MB)
Trainable params: 541608 (2.07 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [ ]:
model_name = 'fifth_try_3.h5'
if os.path.exists(model_name):
    model = load_model(model_name)
else:
    model.fit(X_train, y_train, epochs=1, validation_split=0.2)
    model.save(model_name)
2024-02-13 07:19:00.028844: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node Adam/AssignAddVariableOp.
7547/7547 [==============================] - 222s 29ms/step - loss: 0.1694 - accuracy: 0.0482 - val_loss: 0.1682 - val_accuracy: 0.0476
/Users/zaina/miniconda3/lib/python3.10/site-packages/keras/src/engine/training.py:3103: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
  saving_api.save_model(
In [ ]:
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout, Conv1D, MaxPooling1D, Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import Callback
from sklearn.metrics import precision_score, recall_score, f1_score

# Assuming 'X' and 'y' are your features and labels, respectively

# Data Preprocessing
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)  # Normalize features

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Model Definition
model = Sequential([
    LSTM(64, return_sequences=True, input_shape=(X_train.shape[1], 1)),
    Dropout(0.2),
    Conv1D(64, kernel_size=2, activation='relu'),
    MaxPooling1D(pool_size=2),
    Conv1D(128, kernel_size=3, activation='relu'),
    MaxPooling1D(pool_size=2),
    Flatten(),
    Dense(100, activation='relu'),
    Dense(y_train.shape[1], activation='sigmoid')  # Output layer
])

model.compile(optimizer=Adam(learning_rate=0.001),
              loss='binary_crossentropy',
              metrics=['accuracy'])  # Add other metrics as needed

# Custom Callback for Precision, Recall, F1 Score
class MetricsCallback(Callback):
    def on_epoch_end(self, epoch, logs=None):
        val_predict = (np.asarray(self.model.predict(X_test))).round()
        val_targ = y_test
        _val_precision = precision_score(val_targ, val_predict, average='micro')
        _val_recall = recall_score(val_targ, val_predict, average='micro')
        _val_f1 = f1_score(val_targ, val_predict, average='micro')
        print(f' — val_precision: {_val_precision:.4f} — val_recall: {_val_recall:.4f} — val_f1: {_val_f1:.4f}')

# Model Training
model.fit(X_train, y_train,
          validation_data=(X_test, y_test),
          epochs=1,  # Adjust number of epochs as necessary
          batch_size=32,  # Adjust batch size as necessary
          callbacks=[MetricsCallback()])

# Note: This is a simplified example. In practice, you might need to adjust the model architecture, preprocessing steps,
# and training parameters based on the specifics of your dataset and task.

model_name = 'fifth_try_4.h5'
if os.path.exists(model_name):
    model = load_model(model_name)
else:
    model.fit(X_train, y_train, validation_split=0.2, epochs=1, batch_size=32, callbacks=[MetricsCallback()])
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
2024-02-13 08:30:31.580864: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node Adam/AssignAddVariableOp.
2359/2359 [==============================] - 21s 9ms/step
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
 — val_precision: 0.0000 — val_recall: 0.0000 — val_f1: 0.0000
9434/9434 [==============================] - 302s 32ms/step - loss: 0.1690 - accuracy: 0.0474 - val_loss: 0.1682 - val_accuracy: 0.0472
2359/2359 [==============================] - 21s 9ms/step
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
 — val_precision: 0.0000 — val_recall: 0.0000 — val_f1: 0.0000
7547/7547 [==============================] - 244s 32ms/step - loss: 0.1677 - accuracy: 0.0468 - val_loss: 0.1676 - val_accuracy: 0.0476
In [ ]:
hierarchy = build_hierarchy(X.columns)
formatted_hierarchy = format_hierarchy(hierarchy)
print(formatted_hierarchy)
- raw_acc:
  - magnitude_stats:
    -  mean
    -  std
    -  moment3
    -  moment4
    -  percentile25
    -  percentile50
    -  percentile75
    -  value_entropy
    -  time_entropy
  - magnitude_spectrum:
    -  log_energy_band0
    -  log_energy_band1
    -  log_energy_band2
    -  log_energy_band3
    -  log_energy_band4
    -  spectral_entropy
  - magnitude_autocorrelation:
    -  period
    -  normalized_ac
  - 3d:
    -  mean_x
    -  mean_y
    -  mean_z
    -  std_x
    -  std_y
    -  std_z
    -  ro_xy
    -  ro_xz
    -  ro_yz
- proc_gyro:
  - magnitude_stats:
    -  mean
    -  std
    -  moment3
    -  moment4
    -  percentile25
    -  percentile50
    -  percentile75
    -  value_entropy
    -  time_entropy
  - magnitude_spectrum:
    -  log_energy_band0
    -  log_energy_band1
    -  log_energy_band2
    -  log_energy_band3
    -  log_energy_band4
    -  spectral_entropy
  - magnitude_autocorrelation:
    -  period
    -  normalized_ac
  - 3d:
    -  mean_x
    -  mean_y
    -  mean_z
    -  std_x
    -  std_y
    -  std_z
    -  ro_xy
    -  ro_xz
    -  ro_yz
- raw_magnet:
  - magnitude_stats:
    -  mean
    -  std
    -  moment3
    -  moment4
    -  percentile25
    -  percentile50
    -  percentile75
    -  value_entropy
    -  time_entropy
  - magnitude_spectrum:
    -  log_energy_band0
    -  log_energy_band1
    -  log_energy_band2
    -  log_energy_band3
    -  log_energy_band4
    -  spectral_entropy
  - magnitude_autocorrelation:
    -  period
    -  normalized_ac
  - 3d:
    -  mean_x
    -  mean_y
    -  mean_z
    -  std_x
    -  std_y
    -  std_z
    -  ro_xy
    -  ro_xz
    -  ro_yz
  -  avr_cosine_similarity_lag_range0
  -  avr_cosine_similarity_lag_range1
  -  avr_cosine_similarity_lag_range2
  -  avr_cosine_similarity_lag_range3
  -  avr_cosine_similarity_lag_range4
- location:
  -  num_valid_updates
  -  log_latitude_range
  -  log_longitude_range
  -  best_horizontal_accuracy
  -  diameter
  -  log_diameter
- location_quick_features:
  -  std_lat
  -  std_long
  -  lat_change
  -  long_change
  -  mean_abs_lat_deriv
  -  mean_abs_long_deriv
- audio_naive:
  - mfcc0:
    -  mean
    -  std
  - mfcc1:
    -  mean
    -  std
  - mfcc2:
    -  mean
    -  std
  - mfcc3:
    -  mean
    -  std
  - mfcc4:
    -  mean
    -  std
  - mfcc5:
    -  mean
    -  std
  - mfcc6:
    -  mean
    -  std
  - mfcc7:
    -  mean
    -  std
  - mfcc8:
    -  mean
    -  std
  - mfcc9:
    -  mean
    -  std
  - mfcc10:
    -  mean
    -  std
  - mfcc11:
    -  mean
    -  std
  - mfcc12:
    -  mean
    -  std
- audio_properties:
  -  max_abs_value
  -  normalization_multiplier
- discrete:
  - app_state:
    -  is_active
    -  is_inactive
    -  is_background
    -  missing
  - battery_plugged:
    -  is_ac
    -  is_usb
    -  is_wireless
    -  missing
  - battery_state:
    -  is_unknown
    -  is_unplugged
    -  is_not_charging
    -  is_discharging
    -  is_charging
    -  is_full
    -  missing
  - on_the_phone:
    -  is_False
    -  is_True
    -  missing
  - ringer_mode:
    -  is_normal
    -  is_silent_no_vibrate
    -  is_silent_with_vibrate
    -  missing
  - wifi_status:
    -  is_not_reachable
    -  is_reachable_via_wifi
    -  is_reachable_via_wwan
    -  missing
  - time_of_day:
    -  between0and6
    -  between3and9
    -  between6and12
    -  between9and15
    -  between12and18
    -  between15and21
    -  between18and24
    -  between21and3
- lf_measurements:
  -  battery_level
-  timestamp_numeric

Testing Batch Sizes¶

In [ ]:
def find_best_batch_size(model, X_train, y_train, X_test, y_test, batch_sizes):
    """
    Trains a given model using different batch sizes, evaluates performance on test data,
    stores each trained model, and returns the best batch size along with its accuracy and a dictionary of models.

    Parameters:
    - model: The initial model to be trained.
    - X_train, y_train: Training data and labels.
    - X_test, y_test: Test data and labels.
    - batch_sizes: List of batch sizes to test.

    Returns:
    - best_batch_size: The batch size yielding the highest accuracy on test data.
    - best_acc: The highest accuracy achieved on test data.
    - models_dict: A dictionary of saved model filenames keyed by their batch sizes.
    """
    models_dict = {}
    best_acc = 0
    best_batch_size = None

    for batch_size in batch_sizes:
        print(f"Training with batch size: {batch_size}")
        # Clone the original model architecture for a fair comparison
        model_clone = clone_model(model)
        model_clone.compile(optimizer=model.optimizer, loss=model.loss, metrics=model.metrics)
        
        # Fit the model
        model_clone.fit(X_train, y_train,
                        epochs=2, 
                        batch_size=batch_size,
                        validation_split=0.2,
                        verbose=1)
        
        # Evaluate the model
        test_loss, test_acc = model_clone.evaluate(X_test, y_test, verbose=2)
        print(f"Test accuracy: {test_acc}, Test loss: {test_loss}")
        
        # Save the model
        model_file_name = f'ExtraSensory_CNN_LSTM_bs{batch_size}.h5'
        model_clone.save(model_file_name)
        models_dict[batch_size] = model_file_name
        
        # Update best model if current is better
        if test_acc > best_acc:
            best_acc = test_acc
            best_batch_size = batch_size

    print(f"Best Batch Size: {best_batch_size} with Test Accuracy: {best_acc}")
    return best_batch_size, best_acc, models_dict

# Example usage:
batch_sizes = [128, 64, 16, 4, 1, None] 
# Call the function and store its return values
best_batch_size, best_acc, models_dict = find_best_batch_size(model, X_train_reshaped, y_train, X_test_reshaped, y_test, batch_sizes)

# Now `models_dict` is available outside of the function
print("Available models and their batch sizes:")
for batch_size, model_path in models_dict.items():
    print(f"Batch Size: {batch_size}, Model Path: {model_path}")

# You can load any model from `models_dict` for further use
# selected_model_path = models_dict[best_batch_size]
# loaded_model = load_model(selected_model_path)
In [ ]:
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
2359/2359 - 13s - loss: 0.1687 - accuracy: 0.0472 - 13s/epoch - 6ms/step
Test accuracy: 0.04721081256866455, Test loss: 0.16865016520023346
In [ ]:
model.save('ExtraSensory_CNN_LSTM_Model_v2.h5')
In [ ]:
# Train the model
history = model.fit(X_train_reshaped, y_train,
                    epochs=10, 
                    batch_size=64,
                    validation_split=0.2,
                    verbose=1)
Epoch 1/10
3774/3774 [==============================] - 98s 26ms/step - loss: 0.1721 - accuracy: 0.0507 - val_loss: 0.1682 - val_accuracy: 0.0476
Epoch 2/10
3774/3774 [==============================] - 64s 17ms/step - loss: 0.1692 - accuracy: 0.0539 - val_loss: 0.1683 - val_accuracy: 0.0476
Epoch 3/10
3774/3774 [==============================] - 64s 17ms/step - loss: 0.1693 - accuracy: 0.0538 - val_loss: 0.1680 - val_accuracy: 0.0476
Epoch 4/10
3774/3774 [==============================] - 64s 17ms/step - loss: 0.1693 - accuracy: 0.0554 - val_loss: 0.1680 - val_accuracy: 0.0476
Epoch 5/10
3774/3774 [==============================] - 64s 17ms/step - loss: 0.1694 - accuracy: 0.0572 - val_loss: 0.1683 - val_accuracy: 0.0476
Epoch 6/10
3774/3774 [==============================] - 64s 17ms/step - loss: 0.1695 - accuracy: 0.0586 - val_loss: 0.1682 - val_accuracy: 0.0476
Epoch 7/10
3774/3774 [==============================] - 64s 17ms/step - loss: 0.1696 - accuracy: 0.0593 - val_loss: 0.1694 - val_accuracy: 0.0476
Epoch 8/10
3774/3774 [==============================] - 64s 17ms/step - loss: 0.1697 - accuracy: 0.0593 - val_loss: 0.1684 - val_accuracy: 0.0476
Epoch 9/10
3774/3774 [==============================] - 65s 17ms/step - loss: 0.1698 - accuracy: 0.0610 - val_loss: 0.1681 - val_accuracy: 0.0476
Epoch 10/10
3774/3774 [==============================] - 65s 17ms/step - loss: 0.1698 - accuracy: 0.0625 - val_loss: 0.1681 - val_accuracy: 0.0476
In [ ]:
# Train the model
history = model.fit(X_train_reshaped, y_train,
                    epochs=2, 
                    batch_size=20,
                    validation_split=0.2,
                    verbose=1)
Epoch 1/2
12075/12075 [==============================] - 219s 18ms/step - loss: 0.1717 - accuracy: 0.0778 - val_loss: 0.1700 - val_accuracy: 0.0476
Epoch 2/2
12075/12075 [==============================] - 191s 16ms/step - loss: 0.1720 - accuracy: 0.0802 - val_loss: 0.1695 - val_accuracy: 0.0476
In [ ]:
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
2359/2359 - 15s - loss: 0.1700 - accuracy: 0.0472 - 15s/epoch - 6ms/step
Test accuracy: 0.04721081256866455, Test loss: 0.16999678313732147
In [ ]:
model.save('ExtraSensory_CNN_LSTM_Model_v2_bs_20.h5')
/Users/zaina/miniconda3/lib/python3.10/site-packages/keras/src/engine/training.py:3103: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
  saving_api.save_model(
In [ ]:
# Train the model
history = model.fit(X_train_reshaped, y_train,
                    epochs=2, 
                    batch_size=2,
                    validation_split=0.2,
                    verbose=1)
Epoch 1/2
 21579/120750 [====>.........................] - ETA: 22:30 - loss: 0.1824 - accuracy: 0.1195
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[230], line 2
      1 # Train the model
----> 2 history = model.fit(X_train_reshaped, y_train,
      3                     epochs=2, 
      4                     batch_size=2,
      5                     validation_split=0.2,
      6                     verbose=1)

File ~/miniconda3/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:65, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     63 filtered_tb = None
     64 try:
---> 65     return fn(*args, **kwargs)
     66 except Exception as e:
     67     filtered_tb = _process_traceback_frames(e.__traceback__)

File ~/miniconda3/lib/python3.10/site-packages/keras/src/engine/training.py:1807, in Model.fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1799 with tf.profiler.experimental.Trace(
   1800     "train",
   1801     epoch_num=epoch,
   (...)
   1804     _r=1,
   1805 ):
   1806     callbacks.on_train_batch_begin(step)
-> 1807     tmp_logs = self.train_function(iterator)
   1808     if data_handler.should_sync:
   1809         context.async_wait()

File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py:150, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    148 filtered_tb = None
    149 try:
--> 150   return fn(*args, **kwargs)
    151 except Exception as e:
    152   filtered_tb = _process_traceback_frames(e.__traceback__)

File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:832, in Function.__call__(self, *args, **kwds)
    829 compiler = "xla" if self._jit_compile else "nonXla"
    831 with OptionalXlaContext(self._jit_compile):
--> 832   result = self._call(*args, **kwds)
    834 new_tracing_count = self.experimental_get_tracing_count()
    835 without_tracing = (tracing_count == new_tracing_count)

File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:868, in Function._call(self, *args, **kwds)
    865   self._lock.release()
    866   # In this case we have created variables on the first call, so we run the
    867   # defunned version which is guaranteed to never create variables.
--> 868   return tracing_compilation.call_function(
    869       args, kwds, self._no_variable_creation_config
    870   )
    871 elif self._variable_creation_config is not None:
    872   # Release the lock early so that multiple threads can perform the call
    873   # in parallel.
    874   self._lock.release()

File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compilation.py:139, in call_function(args, kwargs, tracing_options)
    137 bound_args = function.function_type.bind(*args, **kwargs)
    138 flat_inputs = function.function_type.unpack_inputs(bound_args)
--> 139 return function._call_flat(  # pylint: disable=protected-access
    140     flat_inputs, captured_inputs=function.captured_inputs
    141 )

File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/concrete_function.py:1323, in ConcreteFunction._call_flat(self, tensor_inputs, captured_inputs)
   1319 possible_gradient_type = gradients_util.PossibleTapeGradientTypes(args)
   1320 if (possible_gradient_type == gradients_util.POSSIBLE_GRADIENT_TYPES_NONE
   1321     and executing_eagerly):
   1322   # No tape is watching; skip to running the function.
-> 1323   return self._inference_function.call_preflattened(args)
   1324 forward_backward = self._select_forward_and_backward_functions(
   1325     args,
   1326     possible_gradient_type,
   1327     executing_eagerly)
   1328 forward_function, args_with_tangents = forward_backward.forward()

File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/atomic_function.py:216, in AtomicFunction.call_preflattened(self, args)
    214 def call_preflattened(self, args: Sequence[core.Tensor]) -> Any:
    215   """Calls with flattened tensor inputs and returns the structured output."""
--> 216   flat_outputs = self.call_flat(*args)
    217   return self.function_type.pack_output(flat_outputs)

File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/polymorphic_function/atomic_function.py:251, in AtomicFunction.call_flat(self, *args)
    249 with record.stop_recording():
    250   if self._bound_context.executing_eagerly():
--> 251     outputs = self._bound_context.call_function(
    252         self.name,
    253         list(args),
    254         len(self.function_type.flat_outputs),
    255     )
    256   else:
    257     outputs = make_call_op_in_graph(
    258         self,
    259         list(args),
    260         self._bound_context.function_call_options.as_attrs(),
    261     )

File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/context.py:1486, in Context.call_function(self, name, tensor_inputs, num_outputs)
   1484 cancellation_context = cancellation.context()
   1485 if cancellation_context is None:
-> 1486   outputs = execute.execute(
   1487       name.decode("utf-8"),
   1488       num_outputs=num_outputs,
   1489       inputs=tensor_inputs,
   1490       attrs=attrs,
   1491       ctx=self,
   1492   )
   1493 else:
   1494   outputs = execute.execute_with_cancellation(
   1495       name.decode("utf-8"),
   1496       num_outputs=num_outputs,
   (...)
   1500       cancellation_manager=cancellation_context,
   1501   )

File ~/miniconda3/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:53, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     51 try:
     52   ctx.ensure_initialized()
---> 53   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     54                                       inputs, attrs, num_outputs)
     55 except core._NotOkStatusException as e:
     56   if name is not None:

KeyboardInterrupt: 
In [ ]:
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
In [ ]:
model.save('ExtraSensory_CNN_LSTM_Model_v2_bs_2.h5')
In [ ]:
# Train the model
history = model.fit(X_train_reshaped, y_train,
                    epochs=2, 
                    validation_split=0.2,
                    verbose=1)
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test_reshaped, y_test, verbose=2)
print(f'Test accuracy: {test_acc}, Test loss: {test_loss}')
model.save('ExtraSensory_CNN_LSTM_Model_v2_bs_1.h5')

Testing Thresholds¶

In [ ]:
# Calculate the total NaN count for each feature across all users
total_nan_counts = nan_counts_per_user.sum()

# Assuming `X_with_users` is your original DataFrame and has the same number of entries for each user,
# Calculate the total number of entries for a single feature across all users
total_entries_per_feature = len(X_with_users)  # Or, more specifically, len(users) * average_entries_per_user if varies

# Calculate the percentage of missing data for each feature
percentage_missing = (total_nan_counts / total_entries_per_feature) * 100


def testing_threshold(threshold, testing_user, fill_type= 'ffill', epochs = 1 ):

    # Decide on a threshold for removing columns, e.g., 1%
    threshold = threshold

    # Identify columns that exceed this threshold
    columns_to_remove = percentage_missing[percentage_missing > threshold].index.tolist()

    print("Removing ", len(columns_to_remove),"columns out of ", len(X_with_users.columns))
    # Print out the columns to remove
    # print("Columns to remove due to excessive missing data:", columns_to_remove)

    features_to_include = [feature for feature in features if feature not in columns_to_remove]

    # First User 
    user_df = X_with_users[X_with_users['user_id'] == users[testing_user]]
    
    if fill_type == 'ffill':
    # Forward fill
        user_df = user_df[features_to_include].ffill()
    elif fill_type == 'mean':
        # Fill missing values with the mean of each column
        mean_values = user_df[features_to_include].mean()
        user_df = user_df[features_to_include].fillna(mean_values)
    elif fill_type == 'median':
        # Fill missing values with the median of each column
        median_values = user_df[features_to_include].median()
        user_df = user_df[features_to_include].fillna(median_values)
    elif fill_type == 'zero':
        # Fill missing values with zero
        user_df = user_df[features_to_include].fillna(0)
    else:
        # If no valid fill_type is provided, print a warning or fill with a default method
        print("Invalid fill_type. No changes made to user_df.")

    scaler = StandardScaler()
    user_df[features_to_include] = scaler.fit_transform(user_df)

    # Define LSTM model architecture
    def create_lstm_model(input_shape):
        model = Sequential([
            LSTM(50, activation='relu', input_shape=input_shape),
            Dense(len(features_to_include), activation='relu') 
        ])
        model.compile(optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.001, clipvalue=0.5), loss='mse')

        return model


    # Assuming timestamps are sorted; if not, sort user_df by timestamp here
    user_df.sort_values('timestamp', inplace=True)

    # Convert user_df to sequences for LSTM
    look_back = 3 
    generator = TimeseriesGenerator(user_df[features_to_include].values, user_df[features_to_include].values,
                                    length=look_back, batch_size=1)

    # Create and train LSTM model on the selected user's data
    model = create_lstm_model((look_back, len(features_to_include)))
    model.fit(generator, epochs=epochs, verbose=1)  # Adjust epochs and verbosity as needed

    return model, features_to_include
In [ ]:
 
Out[ ]:
numpy.int64
In [ ]:
models , features_for_model = {} , {}
model_name = "threshold0_median_epoch1"
models[model_name], features_for_model[model_name]  = testing_threshold(0, -1, fill_type = 'median')
best_model_one_so_far = 'threshold0_median_epoch1'
print(features_for_model[best_model_one_so_far])
192 out of  279
Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_31 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 95s 19ms/step - loss: 1.6473
['timestamp', 'discrete:app_state:is_active', 'discrete:app_state:is_inactive', 'discrete:app_state:is_background', 'discrete:app_state:missing', 'discrete:battery_plugged:is_ac', 'discrete:battery_plugged:is_usb', 'discrete:battery_plugged:is_wireless', 'discrete:battery_plugged:missing', 'discrete:battery_state:is_unknown', 'discrete:battery_state:is_unplugged', 'discrete:battery_state:is_not_charging', 'discrete:battery_state:is_discharging', 'discrete:battery_state:is_charging', 'discrete:battery_state:is_full', 'discrete:battery_state:missing', 'discrete:on_the_phone:is_False', 'discrete:on_the_phone:is_True', 'discrete:on_the_phone:missing', 'discrete:ringer_mode:is_normal', 'discrete:ringer_mode:is_silent_no_vibrate', 'discrete:ringer_mode:is_silent_with_vibrate', 'discrete:ringer_mode:missing', 'discrete:wifi_status:is_not_reachable', 'discrete:wifi_status:is_reachable_via_wifi', 'discrete:wifi_status:is_reachable_via_wwan', 'discrete:wifi_status:missing', 'discrete:time_of_day:between0and6', 'discrete:time_of_day:between3and9', 'discrete:time_of_day:between6and12', 'discrete:time_of_day:between9and15', 'discrete:time_of_day:between12and18', 'discrete:time_of_day:between15and21', 'discrete:time_of_day:between18and24', 'discrete:time_of_day:between21and3']
In [ ]:
## Testing

import numpy as np
from sklearn.preprocessing import StandardScaler


def predict_from_df(df, model, features_to_include, look_back=3):
    """
    Process the given DataFrame and predict the next value using the LSTM model.
    
    Parameters:
    - df: DataFrame to process and predict from.
    - model: Trained LSTM model to use for predictions.
    - features_to_include: List of feature names to include in the prediction.
    - look_back: Number of previous time steps to use as input for predictions.
    
    Returns:
    - predictions: Predicted values for the next time step.
    """
    # Ensure the DataFrame contains the necessary features
    if not all(feature in df.columns for feature in features_to_include):
        raise ValueError("DataFrame missing required features")
    
    # Fill missing values with median
    df_filled = df[features_to_include].fillna(df[features_to_include].median())
    
    # Scale features
    scaler = StandardScaler().fit(df_filled)
    df_scaled = scaler.transform(df_filled)
    
    # Create sequences
    sequences = np.array([df_scaled[i - look_back:i] for i in range(look_back, len(df_scaled) + 1)])
    
    # Predict using the LSTM model
    predictions = model.predict(sequences)
    
    return predictions



# Example of how to use the function
# Ensure 'model', 'features_to_include', and 'look_back' are defined as per your model's training setup
predictions = predict_from_df(df_user[features_for_model[model_name]].iloc[1:4],  models[best_model_one_so_far], features_for_model[best_model_one_so_far], look_back=3)
print(predictions)
192 out of  279
Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_30 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 94s 19ms/step - loss: 0.2351
['timestamp', 'discrete:app_state:is_active', 'discrete:app_state:is_inactive', 'discrete:app_state:is_background', 'discrete:app_state:missing', 'discrete:battery_plugged:is_ac', 'discrete:battery_plugged:is_usb', 'discrete:battery_plugged:is_wireless', 'discrete:battery_plugged:missing', 'discrete:battery_state:is_unknown', 'discrete:battery_state:is_unplugged', 'discrete:battery_state:is_not_charging', 'discrete:battery_state:is_discharging', 'discrete:battery_state:is_charging', 'discrete:battery_state:is_full', 'discrete:battery_state:missing', 'discrete:on_the_phone:is_False', 'discrete:on_the_phone:is_True', 'discrete:on_the_phone:missing', 'discrete:ringer_mode:is_normal', 'discrete:ringer_mode:is_silent_no_vibrate', 'discrete:ringer_mode:is_silent_with_vibrate', 'discrete:ringer_mode:missing', 'discrete:wifi_status:is_not_reachable', 'discrete:wifi_status:is_reachable_via_wifi', 'discrete:wifi_status:is_reachable_via_wwan', 'discrete:wifi_status:missing', 'discrete:time_of_day:between0and6', 'discrete:time_of_day:between3and9', 'discrete:time_of_day:between6and12', 'discrete:time_of_day:between9and15', 'discrete:time_of_day:between12and18', 'discrete:time_of_day:between15and21', 'discrete:time_of_day:between18and24', 'discrete:time_of_day:between21and3']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[78], line 47
     43 print(features_for_model[best_model_one_so_far])
     45 # Example of how to use the function
     46 # Ensure 'model', 'features_to_include', and 'look_back' are defined as per your model's training setup
---> 47 predictions = predict_from_df(df_user.iloc[1:4],  models_with_threshhold[best_model_one_so_far], features_to_include[best_model_one_so_far], look_back=3)
     48 print(predictions)

KeyError: 'threshold0_median_epoch1'
In [ ]:
print(features_for_model[model_name])
df_user[features_for_model[model_name]].iloc[4].values
['timestamp', 'discrete:app_state:is_active', 'discrete:app_state:is_inactive', 'discrete:app_state:is_background', 'discrete:app_state:missing', 'discrete:battery_plugged:is_ac', 'discrete:battery_plugged:is_usb', 'discrete:battery_plugged:is_wireless', 'discrete:battery_plugged:missing', 'discrete:battery_state:is_unknown', 'discrete:battery_state:is_unplugged', 'discrete:battery_state:is_not_charging', 'discrete:battery_state:is_discharging', 'discrete:battery_state:is_charging', 'discrete:battery_state:is_full', 'discrete:battery_state:missing', 'discrete:on_the_phone:is_False', 'discrete:on_the_phone:is_True', 'discrete:on_the_phone:missing', 'discrete:ringer_mode:is_normal', 'discrete:ringer_mode:is_silent_no_vibrate', 'discrete:ringer_mode:is_silent_with_vibrate', 'discrete:ringer_mode:missing', 'discrete:wifi_status:is_not_reachable', 'discrete:wifi_status:is_reachable_via_wifi', 'discrete:wifi_status:is_reachable_via_wwan', 'discrete:wifi_status:missing', 'discrete:time_of_day:between0and6', 'discrete:time_of_day:between3and9', 'discrete:time_of_day:between6and12', 'discrete:time_of_day:between9and15', 'discrete:time_of_day:between12and18', 'discrete:time_of_day:between15and21', 'discrete:time_of_day:between18and24', 'discrete:time_of_day:between21and3']
Out[ ]:
array([1.44831694e+09, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       1.00000000e+00, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 1.00000000e+00, 1.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00])
In [ ]:
models_with_threshhold = {}
features_to_include = {}
for i in range(6):
    print("for threshold: ", i)
    models_with_threshhold[i], features_to_include[i] = testing_threshold(i, -1)
for threshold:  0
192 out of  279
Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_2 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 95s 19ms/step - loss: 0.7676
for threshold:  1
165 out of  279
Columns to remove due to excessive missing data: ['proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_3 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 95s 19ms/step - loss: 0.9039
for threshold:  2
163 out of  279
Columns to remove due to excessive missing data: ['proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_4 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 95s 19ms/step - loss: 0.9191
for threshold:  3
137 out of  279
Columns to remove due to excessive missing data: ['proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_5 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 97s 20ms/step - loss: 0.9159
for threshold:  4
137 out of  279
Columns to remove due to excessive missing data: ['proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_6 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 97s 20ms/step - loss: 0.9998
for threshold:  5
114 out of  279
Columns to remove due to excessive missing data: ['proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_7 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 96s 19ms/step - loss: 1.0113
In [ ]:
fill_options = ['mean', 'median', 'zero']
for fill in fill_options:
    print("Checking different fills at 5 threshhold with ", fill, ' fill option')
    models_with_threshhold[fill], features_to_include[fill] = testing_threshold(5, -1, fill_type = fill)
Checking different fills at 5 threshhold with  mean  fill option
114 out of  279
Columns to remove due to excessive missing data: ['proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_11 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 95s 19ms/step - loss: 1.0229
Checking different fills at 5 threshhold with  median  fill option
114 out of  279
Columns to remove due to excessive missing data: ['proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_12 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 95s 19ms/step - loss: 0.9555
Checking different fills at 5 threshhold with  zero  fill option
114 out of  279
Columns to remove due to excessive missing data: ['proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_13 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 100s 20ms/step - loss: 0.9557

Testing epoch = 2¶

In [ ]:
model_name = "threshold0_median_epoch2"
models_with_threshhold[model_name], features_to_include[model_name]  = testing_threshold(0, -1, fill_type = 'median', epochs= 2)
192 out of  279
Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_28 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
Epoch 1/2
4924/4924 [==============================] - 101s 20ms/step - loss: 0.7920
Epoch 2/2
4924/4924 [==============================] - 98s 20ms/step - loss: 1.0072

We will go with the lowest threshold and with median fill option.¶

In [ ]:
model_name = "threshold0_median_epoch1"
models_with_threshhold[model_name], features_to_include[model_name]  = testing_threshold(0, -1, fill_type = 'median')
192 out of  279
Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient', 'total_length']
WARNING:tensorflow:Layer lstm_24 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
4924/4924 [==============================] - 103s 21ms/step - loss: 0.8103
In [ ]:
best_model_one_so_far = 'threshold0_median_epoch1'
print(features_to_include[best_model_one_so_far])
['timestamp', 'discrete:app_state:is_active', 'discrete:app_state:is_inactive', 'discrete:app_state:is_background', 'discrete:app_state:missing', 'discrete:battery_plugged:is_ac', 'discrete:battery_plugged:is_usb', 'discrete:battery_plugged:is_wireless', 'discrete:battery_plugged:missing', 'discrete:battery_state:is_unknown', 'discrete:battery_state:is_unplugged', 'discrete:battery_state:is_not_charging', 'discrete:battery_state:is_discharging', 'discrete:battery_state:is_charging', 'discrete:battery_state:is_full', 'discrete:battery_state:missing', 'discrete:on_the_phone:is_False', 'discrete:on_the_phone:is_True', 'discrete:on_the_phone:missing', 'discrete:ringer_mode:is_normal', 'discrete:ringer_mode:is_silent_no_vibrate', 'discrete:ringer_mode:is_silent_with_vibrate', 'discrete:ringer_mode:missing', 'discrete:wifi_status:is_not_reachable', 'discrete:wifi_status:is_reachable_via_wifi', 'discrete:wifi_status:is_reachable_via_wwan', 'discrete:wifi_status:missing', 'discrete:time_of_day:between0and6', 'discrete:time_of_day:between3and9', 'discrete:time_of_day:between6and12', 'discrete:time_of_day:between9and15', 'discrete:time_of_day:between12and18', 'discrete:time_of_day:between15and21', 'discrete:time_of_day:between18and24', 'discrete:time_of_day:between21and3']
In [ ]:
import numpy as np
from sklearn.preprocessing import StandardScaler


def predict_from_df(df, model, features_to_include, look_back=3):
    """
    Process the given DataFrame and predict the next value using the LSTM model.
    
    Parameters:
    - df: DataFrame to process and predict from.
    - model: Trained LSTM model to use for predictions.
    - features_to_include: List of feature names to include in the prediction.
    - look_back: Number of previous time steps to use as input for predictions.
    
    Returns:
    - predictions: Predicted values for the next time step.
    """
    # Ensure the DataFrame contains the necessary features
    if not all(feature in df.columns for feature in features_to_include):
        raise ValueError("DataFrame missing required features")
    
    # Fill missing values with median
    df_filled = df[features_to_include].fillna(df[features_to_include].median())
    
    # Scale features
    scaler = StandardScaler().fit(df_filled)
    df_scaled = scaler.transform(df_filled)
    
    # Create sequences
    sequences = np.array([df_scaled[i - look_back:i] for i in range(look_back, len(df_scaled) + 1)])
    
    # Predict using the LSTM model
    predictions = model.predict(sequences)
    
    return predictions

# Example of how to use the function
# Ensure 'model', 'features_to_include', and 'look_back' are defined as per your model's training setup
predictions = predict_from_df(df_user.iloc[1:4],  models_with_threshhold[best_model_one_so_far], features_to_include[best_model_one_so_far], look_back=3)
print(predictions)
1/1 [==============================] - 0s 142ms/step
[[-0.04484926]]
In [ ]:
# Calculate the total NaN count for each feature across all users
total_nan_counts = nan_counts_per_user.sum()

# Assuming `X_with_users` is your original DataFrame and has the same number of entries for each user,
# Calculate the total number of entries for a single feature across all users
total_entries_per_feature = len(X_with_users)  # Or, more specifically, len(users) * average_entries_per_user if varies

# Calculate the percentage of missing data for each feature
percentage_missing = (total_nan_counts / total_entries_per_feature) * 100

# Decide on a threshold for removing columns, e.g., 1%
threshold = 0

# Identify columns that exceed this threshold
columns_to_remove = percentage_missing[percentage_missing > threshold].index.tolist()

print(len(columns_to_remove),"out of ", len(X_with_users.columns))
# Print out the columns to remove
print("Columns to remove due to excessive missing data:", columns_to_remove)

features = [feature for feature in features if feature not in columns_to_remove]
191 out of  279
Columns to remove due to excessive missing data: ['raw_acc:magnitude_stats:mean', 'raw_acc:magnitude_stats:std', 'raw_acc:magnitude_stats:moment3', 'raw_acc:magnitude_stats:moment4', 'raw_acc:magnitude_stats:percentile25', 'raw_acc:magnitude_stats:percentile50', 'raw_acc:magnitude_stats:percentile75', 'raw_acc:magnitude_stats:value_entropy', 'raw_acc:magnitude_stats:time_entropy', 'raw_acc:magnitude_spectrum:log_energy_band0', 'raw_acc:magnitude_spectrum:log_energy_band1', 'raw_acc:magnitude_spectrum:log_energy_band2', 'raw_acc:magnitude_spectrum:log_energy_band3', 'raw_acc:magnitude_spectrum:log_energy_band4', 'raw_acc:magnitude_spectrum:spectral_entropy', 'raw_acc:magnitude_autocorrelation:period', 'raw_acc:magnitude_autocorrelation:normalized_ac', 'raw_acc:3d:mean_x', 'raw_acc:3d:mean_y', 'raw_acc:3d:mean_z', 'raw_acc:3d:std_x', 'raw_acc:3d:std_y', 'raw_acc:3d:std_z', 'raw_acc:3d:ro_xy', 'raw_acc:3d:ro_xz', 'raw_acc:3d:ro_yz', 'proc_gyro:magnitude_stats:mean', 'proc_gyro:magnitude_stats:std', 'proc_gyro:magnitude_stats:moment3', 'proc_gyro:magnitude_stats:moment4', 'proc_gyro:magnitude_stats:percentile25', 'proc_gyro:magnitude_stats:percentile50', 'proc_gyro:magnitude_stats:percentile75', 'proc_gyro:magnitude_stats:value_entropy', 'proc_gyro:magnitude_stats:time_entropy', 'proc_gyro:magnitude_spectrum:log_energy_band0', 'proc_gyro:magnitude_spectrum:log_energy_band1', 'proc_gyro:magnitude_spectrum:log_energy_band2', 'proc_gyro:magnitude_spectrum:log_energy_band3', 'proc_gyro:magnitude_spectrum:log_energy_band4', 'proc_gyro:magnitude_spectrum:spectral_entropy', 'proc_gyro:magnitude_autocorrelation:period', 'proc_gyro:magnitude_autocorrelation:normalized_ac', 'proc_gyro:3d:mean_x', 'proc_gyro:3d:mean_y', 'proc_gyro:3d:mean_z', 'proc_gyro:3d:std_x', 'proc_gyro:3d:std_y', 'proc_gyro:3d:std_z', 'proc_gyro:3d:ro_xy', 'proc_gyro:3d:ro_xz', 'proc_gyro:3d:ro_yz', 'raw_magnet:magnitude_stats:mean', 'raw_magnet:magnitude_stats:std', 'raw_magnet:magnitude_stats:moment3', 'raw_magnet:magnitude_stats:moment4', 'raw_magnet:magnitude_stats:percentile25', 'raw_magnet:magnitude_stats:percentile50', 'raw_magnet:magnitude_stats:percentile75', 'raw_magnet:magnitude_stats:value_entropy', 'raw_magnet:magnitude_stats:time_entropy', 'raw_magnet:magnitude_spectrum:log_energy_band0', 'raw_magnet:magnitude_spectrum:log_energy_band1', 'raw_magnet:magnitude_spectrum:log_energy_band2', 'raw_magnet:magnitude_spectrum:log_energy_band3', 'raw_magnet:magnitude_spectrum:log_energy_band4', 'raw_magnet:magnitude_spectrum:spectral_entropy', 'raw_magnet:magnitude_autocorrelation:period', 'raw_magnet:magnitude_autocorrelation:normalized_ac', 'raw_magnet:3d:mean_x', 'raw_magnet:3d:mean_y', 'raw_magnet:3d:mean_z', 'raw_magnet:3d:std_x', 'raw_magnet:3d:std_y', 'raw_magnet:3d:std_z', 'raw_magnet:3d:ro_xy', 'raw_magnet:3d:ro_xz', 'raw_magnet:3d:ro_yz', 'raw_magnet:avr_cosine_similarity_lag_range0', 'raw_magnet:avr_cosine_similarity_lag_range1', 'raw_magnet:avr_cosine_similarity_lag_range2', 'raw_magnet:avr_cosine_similarity_lag_range3', 'raw_magnet:avr_cosine_similarity_lag_range4', 'watch_acceleration:magnitude_stats:mean', 'watch_acceleration:magnitude_stats:std', 'watch_acceleration:magnitude_stats:moment3', 'watch_acceleration:magnitude_stats:moment4', 'watch_acceleration:magnitude_stats:percentile25', 'watch_acceleration:magnitude_stats:percentile50', 'watch_acceleration:magnitude_stats:percentile75', 'watch_acceleration:magnitude_stats:value_entropy', 'watch_acceleration:magnitude_stats:time_entropy', 'watch_acceleration:magnitude_spectrum:log_energy_band0', 'watch_acceleration:magnitude_spectrum:log_energy_band1', 'watch_acceleration:magnitude_spectrum:log_energy_band2', 'watch_acceleration:magnitude_spectrum:log_energy_band3', 'watch_acceleration:magnitude_spectrum:log_energy_band4', 'watch_acceleration:magnitude_spectrum:spectral_entropy', 'watch_acceleration:magnitude_autocorrelation:period', 'watch_acceleration:magnitude_autocorrelation:normalized_ac', 'watch_acceleration:3d:mean_x', 'watch_acceleration:3d:mean_y', 'watch_acceleration:3d:mean_z', 'watch_acceleration:3d:std_x', 'watch_acceleration:3d:std_y', 'watch_acceleration:3d:std_z', 'watch_acceleration:3d:ro_xy', 'watch_acceleration:3d:ro_xz', 'watch_acceleration:3d:ro_yz', 'watch_acceleration:spectrum:x_log_energy_band0', 'watch_acceleration:spectrum:x_log_energy_band1', 'watch_acceleration:spectrum:x_log_energy_band2', 'watch_acceleration:spectrum:x_log_energy_band3', 'watch_acceleration:spectrum:x_log_energy_band4', 'watch_acceleration:spectrum:y_log_energy_band0', 'watch_acceleration:spectrum:y_log_energy_band1', 'watch_acceleration:spectrum:y_log_energy_band2', 'watch_acceleration:spectrum:y_log_energy_band3', 'watch_acceleration:spectrum:y_log_energy_band4', 'watch_acceleration:spectrum:z_log_energy_band0', 'watch_acceleration:spectrum:z_log_energy_band1', 'watch_acceleration:spectrum:z_log_energy_band2', 'watch_acceleration:spectrum:z_log_energy_band3', 'watch_acceleration:spectrum:z_log_energy_band4', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3', 'watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4', 'watch_heading:mean_cos', 'watch_heading:std_cos', 'watch_heading:mom3_cos', 'watch_heading:mom4_cos', 'watch_heading:mean_sin', 'watch_heading:std_sin', 'watch_heading:mom3_sin', 'watch_heading:mom4_sin', 'watch_heading:entropy_8bins', 'location:num_valid_updates', 'location:log_latitude_range', 'location:log_longitude_range', 'location:min_altitude', 'location:max_altitude', 'location:min_speed', 'location:max_speed', 'location:best_horizontal_accuracy', 'location:best_vertical_accuracy', 'location:diameter', 'location:log_diameter', 'location_quick_features:std_lat', 'location_quick_features:std_long', 'location_quick_features:lat_change', 'location_quick_features:long_change', 'location_quick_features:mean_abs_lat_deriv', 'location_quick_features:mean_abs_long_deriv', 'audio_naive:mfcc0:mean', 'audio_naive:mfcc1:mean', 'audio_naive:mfcc2:mean', 'audio_naive:mfcc3:mean', 'audio_naive:mfcc4:mean', 'audio_naive:mfcc5:mean', 'audio_naive:mfcc6:mean', 'audio_naive:mfcc7:mean', 'audio_naive:mfcc8:mean', 'audio_naive:mfcc9:mean', 'audio_naive:mfcc10:mean', 'audio_naive:mfcc11:mean', 'audio_naive:mfcc12:mean', 'audio_naive:mfcc0:std', 'audio_naive:mfcc1:std', 'audio_naive:mfcc2:std', 'audio_naive:mfcc3:std', 'audio_naive:mfcc4:std', 'audio_naive:mfcc5:std', 'audio_naive:mfcc6:std', 'audio_naive:mfcc7:std', 'audio_naive:mfcc8:std', 'audio_naive:mfcc9:std', 'audio_naive:mfcc10:std', 'audio_naive:mfcc11:std', 'audio_naive:mfcc12:std', 'audio_properties:max_abs_value', 'audio_properties:normalization_multiplier', 'lf_measurements:light', 'lf_measurements:pressure', 'lf_measurements:proximity_cm', 'lf_measurements:proximity', 'lf_measurements:relative_humidity', 'lf_measurements:battery_level', 'lf_measurements:screen_brightness', 'lf_measurements:temperature_ambient']
In [ ]:
nan_count = df_user[features].isna().sum()
nan_count_sorted = nan_count.sort_values(ascending=False)
print(len(nan_count_sorted))
nan_count_sorted
35
Out[ ]:
timestamp                                      0
discrete:wifi_status:missing                   0
discrete:ringer_mode:is_silent_no_vibrate      0
discrete:ringer_mode:is_silent_with_vibrate    0
discrete:ringer_mode:missing                   0
discrete:wifi_status:is_not_reachable          0
discrete:wifi_status:is_reachable_via_wifi     0
discrete:wifi_status:is_reachable_via_wwan     0
discrete:time_of_day:between0and6              0
discrete:on_the_phone:missing                  0
discrete:time_of_day:between3and9              0
discrete:time_of_day:between6and12             0
discrete:time_of_day:between9and15             0
discrete:time_of_day:between12and18            0
discrete:time_of_day:between15and21            0
discrete:time_of_day:between18and24            0
discrete:ringer_mode:is_normal                 0
discrete:on_the_phone:is_True                  0
discrete:app_state:is_active                   0
discrete:battery_plugged:missing               0
discrete:app_state:is_inactive                 0
discrete:app_state:is_background               0
discrete:app_state:missing                     0
discrete:battery_plugged:is_ac                 0
discrete:battery_plugged:is_usb                0
discrete:battery_plugged:is_wireless           0
discrete:battery_state:is_unknown              0
discrete:on_the_phone:is_False                 0
discrete:battery_state:is_unplugged            0
discrete:battery_state:is_not_charging         0
discrete:battery_state:is_discharging          0
discrete:battery_state:is_charging             0
discrete:battery_state:is_full                 0
discrete:battery_state:missing                 0
discrete:time_of_day:between21and3             0
dtype: int64
In [ ]:
# First User 

# Add more features as necessary
df_user = df_user[features].fillna(method='ffill')
scaler = StandardScaler()
df_user[features] = scaler.fit_transform(df_user)

# Define LSTM model architecture
def create_lstm_model(input_shape):
    model = Sequential([
        LSTM(50, activation='relu', input_shape=input_shape),
        Dense(1) 
    ])
    model.compile(optimizer=Adam(learning_rate=0.001, clipvalue=0.5), loss='mse')  # Apply gradient clipping

    return model


# Assuming timestamps are sorted; if not, sort user_data by timestamp here
df_user.sort_values('timestamp', inplace=True)

# Convert user_data to sequences for LSTM
look_back = 5  
generator = TimeseriesGenerator(df_user[features].values, df_user[features].values,
                                length=look_back, batch_size=1)

# Create and train LSTM model on the selected user's data
model = create_lstm_model((look_back, len(features)))
model.fit(generator, epochs=2, verbose=1)  # Adjust epochs and verbosity as needed
WARNING:tensorflow:Layer lstm_9 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_35776/3336542835.py:17: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.
  df_user = df_user[features].fillna(method='ffill')
WARNING:tensorflow:Layer lstm_9 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
Epoch 1/2
2024-02-12 10:47:55.092965: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node Adam/AssignAddVariableOp_6.
4922/4922 [==============================] - 169s 34ms/step - loss: 0.7789
Epoch 2/2
4922/4922 [==============================] - 167s 34ms/step - loss: 0.7444
Out[ ]:
<keras.src.callbacks.History at 0x325e19750>
In [ ]:
df_test = X_with_users[X_with_users['user_id'] == users[3]]
In [ ]:
# Predict and fill missing values function
def predict_and_fill_missing_values(data, model, feature_columns, look_back):
    for i in range(len(data)):
        if pd.isnull(df.loc[i, feature_columns]).any():  # Check if any feature value is missing
            input_seq = data[feature_columns].iloc[max(i-look_back, 0):i].values
            input_seq = scaler.transform(input_seq)  # Normalize the input
            input_seq = input_seq.reshape((1, look_back, len(feature_columns)))
            predicted_value = model.predict(input_seq)
            data.loc[i, feature_columns] = scaler.inverse_transform(predicted_value)  # Fill with prediction
    return data



# Fill missing values in the original DataFrame
df_filled_with_predictions = predict_and_fill_missing_values(df_test, model, features, look_back)
In [ ]:
for column in features:
    # Find indices with missing values for the current column
    missing_indices = df_test[df_test[column].isnull()].index.tolist()
    
    for missing_index in missing_indices:
        # Check if there are enough previous data points
        if missing_index >= look_back:
            # Prepare the input sequence for prediction
            # Assuming all features in 'features' list are used for prediction
            input_sequence = df_test[features].iloc[missing_index-look_back:missing_index].values
            input_sequence = scaler.transform(input_sequence)  # Scale the sequence according to previous scaler fit
            input_sequence = input_sequence.reshape((1, look_back, len(features)))
            
            # Predict the missing value
            predicted_value = model.predict(input_sequence)
            predicted_value = scaler.inverse_transform(predicted_value)  # Assuming the model predicts the target column
            
            # Update the DataFrame with the predicted value
            # Here, we handle single-feature prediction; if predicting multiple, adjust accordingly
            df_test.at[missing_index, column] = predicted_value[0, 0]  # Adjust indexing based on your prediction shape
In [ ]:
y = combined_csv_data[output_columns]
In [ ]:
def missing_value_check(df, df_name = "BrokenPipeError"):
    missing_values = df.isna().sum()
    missing_values = missing_values[missing_values > 0]

    if len(missing_values) > 0:
        plt.figure(figsize=(15, 60))
        missing_values.sort_values(ascending=True).plot(kind='barh')
        plt.title(f'Missing Values in Each Column ({df_name})')
        plt.xlabel('Columns')
        plt.ylabel('Number of Missing Values')
        plt.show()

    else:
        print('All the missing values have been covered.')
In [ ]:
missing_value_check(X, 'X')
X.fillna(-1, inplace=True)
#TODO: Need to find the best way to add the missing values
No description has been provided for this image
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_4080/2398456764.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X.fillna(-1, inplace=True)
In [ ]:
missing_value_check(y, 'y')
y.fillna(0, inplace=True)
missing_value_check(y, 'y')
No description has been provided for this image
All the missing values have been covered.
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_4080/1312125286.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  y.fillna(0, inplace=True)

Model Testing¶

In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
In [ ]:
import tensorflow as tf


model = tf.keras.Sequential([
    tf.keras.Input(shape=(len(input_columns),)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(len(output_columns))]
)
In [ ]:
model.compile(optimizer='adam', loss='mse')

JUST AN IMAGE¶

Only need few epochs:

image.png image-2.png

In [ ]:
epochs = 3
batch_size = 32

history = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=0.2)
Epoch 1/3
514/514 [==============================] - 10s 18ms/step - loss: 265207989403648.0000 - val_loss: 16220.5537
Epoch 2/3
514/514 [==============================] - 2s 5ms/step - loss: 16198.3203 - val_loss: 15281.2402
Epoch 3/3
514/514 [==============================] - 2s 5ms/step - loss: 493791648.0000 - val_loss: 82575944.0000
In [ ]:
# Save the entire model to a file
model.save("tf_model_v2.h5")
/Users/zaina/miniconda3/lib/python3.10/site-packages/keras/src/engine/training.py:3103: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
  saving_api.save_model(
In [ ]:
print(y_test[1:3])
y_pred_0 = model.predict(X_test[1:3])
print(y_pred_0)
      label:LYING_DOWN  label:SITTING  label:FIX_walking  label:FIX_running  \
4209               1.0            0.0                0.0                0.0   
3455               0.0            0.0                0.0                0.0   

      label:BICYCLING  label:SLEEPING  label:LAB_WORK  label:IN_CLASS  \
4209              0.0             1.0             0.0             0.0   
3455              1.0             0.0             0.0             0.0   

      label:IN_A_MEETING  label:LOC_main_workplace  label:OR_indoors  \
4209                 0.0                       0.0               1.0   
3455                 0.0                       0.0               0.0   

      label:OR_outside  label:IN_A_CAR  label:ON_A_BUS  \
4209               0.0             0.0             0.0   
3455               0.0             0.0             0.0   

      label:DRIVE_-_I_M_THE_DRIVER  label:DRIVE_-_I_M_A_PASSENGER  \
4209                           0.0                            0.0   
3455                           0.0                            0.0   

      label:LOC_home  label:FIX_restaurant  label:PHONE_IN_POCKET  \
4209             0.0                   0.0                    0.0   
3455             0.0                   0.0                    0.0   

      label:OR_exercise  label:COOKING  label:SHOPPING  label:STROLLING  \
4209                0.0            0.0             0.0              0.0   
3455                1.0            0.0             0.0              0.0   

      label:DRINKING__ALCOHOL_  label:BATHING_-_SHOWER  label:CLEANING  \
4209                       0.0                     0.0             0.0   
3455                       0.0                     0.0             0.0   

      label:DOING_LAUNDRY  label:WASHING_DISHES  label:WATCHING_TV  \
4209                  0.0                   0.0                0.0   
3455                  0.0                   0.0                0.0   

      label:SURFING_THE_INTERNET  label:AT_A_PARTY  label:AT_A_BAR  \
4209                         0.0               0.0             0.0   
3455                         0.0               0.0             0.0   

      label:LOC_beach  label:SINGING  label:TALKING  label:COMPUTER_WORK  \
4209              0.0            0.0            0.0                  0.0   
3455              0.0            0.0            0.0                  0.0   

      label:EATING  label:TOILET  label:GROOMING  label:DRESSING  \
4209           0.0           0.0             0.0             0.0   
3455           0.0           0.0             0.0             0.0   

      label:AT_THE_GYM  label:STAIRS_-_GOING_UP  label:STAIRS_-_GOING_DOWN  \
4209               0.0                      0.0                        0.0   
3455               0.0                      0.0                        0.0   

      label:ELEVATOR  label:OR_standing  label:AT_SCHOOL  label:PHONE_IN_HAND  \
4209             0.0                0.0              0.0                  0.0   
3455             0.0                0.0              0.0                  0.0   

      label:PHONE_IN_BAG  label:PHONE_ON_TABLE  label:WITH_CO-WORKERS  \
4209                 0.0                   0.0                    0.0   
3455                 0.0                   0.0                    0.0   

      label:WITH_FRIENDS  
4209                 0.0  
3455                 1.0  
1/1 [==============================] - 0s 46ms/step
[[-2.59731178e+01  1.27983383e+02  9.80889738e-01 -1.23986526e+02
  -2.96019501e+02  8.20305176e+01  1.28011810e+02  3.19692898e+01
   6.39938889e+01 -1.47975021e+02  4.20986694e+02  2.19858627e+01
  -2.10005844e+02  1.11979340e+02  2.07982361e+02  5.60200195e+01
  -7.19873886e+01  3.59612007e+01  3.61966370e+02  3.60004822e+02
  -6.64010315e+02  5.60046501e+01 -3.36010376e+02  2.23978058e+02
   7.60032578e+01 -6.50400039e+04  2.27999527e+02 -9.59694061e+01
   1.32010544e+02  2.92007446e+02  1.59852057e+01  1.11990677e+02
  -8.99841309e+01 -1.27992393e+02 -1.84007034e+02  2.01966812e+02
  -1.47997574e+02  6.40029984e+01 -2.02013046e+02  4.00207596e+01
  -3.12010712e+02  1.19813356e+01  1.11974045e+02  2.47998413e+02
  -2.23997925e+02  1.36005981e+02 -1.56002060e+02  8.00049820e+01
   2.00016113e+02  5.24007385e+02  8.80009766e+01]
 [-2.19731178e+01  1.31983383e+02  1.49808893e+01 -1.13986526e+02
  -3.06019501e+02  7.00305176e+01  1.28011810e+02  7.96929073e+00
   8.79938889e+01 -1.59975021e+02  4.32986694e+02  2.59858627e+01
  -2.18005844e+02  1.31979340e+02  1.79982361e+02  6.40200195e+01
  -7.99873886e+01  3.99612007e+01  3.43966370e+02  3.62004822e+02
  -6.38010315e+02  7.20046463e+01 -3.24010376e+02  2.15978058e+02
   6.40032578e+01 -6.50140039e+04  2.43999527e+02 -9.19694061e+01
   1.50010544e+02  3.00007446e+02 -6.01479435e+00  1.00990677e+02
  -1.09984131e+02 -1.35992401e+02 -1.76007034e+02  2.07966812e+02
  -1.23997574e+02  7.20029984e+01 -1.86013046e+02  2.07594633e-02
  -3.36010712e+02  3.09813347e+01  9.59740448e+01  2.31998413e+02
  -2.23997925e+02  1.56005981e+02 -1.60002060e+02  8.40049820e+01
   1.96016113e+02  5.26007385e+02  7.60009766e+01]]
In [ ]:
loss = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss:.4f}")

# Predictions
y_pred = model.predict(X_test)
276/276 [==============================] - 1s 4ms/step - loss: 82582624.0000
Test Loss: 82582624.0000
276/276 [==============================] - 1s 2ms/step
In [ ]:
print("X_test shape:", X_test.shape)
print("y_test shape:", y_test.shape)
X_test shape: (8806, 226)
y_test shape: (8806, 51)
In [ ]:
# Extracting loss and validation loss values
training_loss = history.history['loss']
validation_loss = history.history['val_loss']

# Creating epoch numbers (starting from 1)
epochs_range = range(1, epochs + 1)

# Plotting the training and validation loss
plt.figure(figsize=(8, 4))
plt.plot(epochs_range, training_loss, 'bo-', label='Training Loss')
plt.plot(epochs_range, validation_loss, 'ro-', label='Validation Loss')
plt.title('Training and Validation Loss per Epoch')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.show()
No description has been provided for this image
In [ ]:
!pip3 freeze > requirements.txt
In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np

# Assuming combined_csv_data is your DataFrame and it has been loaded already



# Before splitting, ensure there are no NaN values in your output columns
for col in output_columns:
    combined_csv_data[col].fillna(0, inplace=True)  # Replace NaN in y with 0, if appropriate

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(combined_csv_data[input_columns], combined_csv_data[output_columns], test_size=0.2, random_state=42)

predictions = {}

for output_col in output_columns:
    # Create a pipeline with an imputer (to fill missing values in features) and logistic regression
    pipeline = make_pipeline(
        SimpleImputer(strategy='mean'),  # Fills missing X values with the mean of each column
        LogisticRegression(max_iter=1000)  # Increased max_iter to ensure convergence
    )
    
    # Fit the pipeline to the training data
    pipeline.fit(X_train, y_train[output_col])
    
    # New data for prediction. This example is simplified and should be replaced with actual new data.
    # Ensure X_new has the same number of features as X_train. Here, we use np.nan as placeholders.
    X_new = np.array([[0.5, 1.2] + [np.nan] * (224)])  # Adjusted to match the feature count of the trained model
    
    # Predicting the probability for the given X_new
    pred_prob = pipeline.predict_proba(X_new)[0][1]
    
    # Storing the prediction
    predictions[output_col] = pred_prob

# Displaying the predicted probabilities
for y_col, prob in predictions.items():
    print(f"Predicted probability for {y_col}: {prob:.2%}")
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_4080/3046016783.py:14: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  combined_csv_data[col].fillna(0, inplace=True)  # Replace NaN in y with 0, if appropriate
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
/Users/zaina/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:565: UserWarning: Skipping features without any observed values: ['location:min_altitude' 'location:max_altitude'
 'location:best_vertical_accuracy' 'lf_measurements:proximity'
 'lf_measurements:screen_brightness']. At least one non-missing value is needed for imputation with strategy='mean'.
  warnings.warn(
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[26], line 29
     23 pipeline = make_pipeline(
     24     SimpleImputer(strategy='mean'),  # Fills missing X values with the mean of each column
     25     LogisticRegression(max_iter=1000)  # Increased max_iter to ensure convergence
     26 )
     28 # Fit the pipeline to the training data
---> 29 pipeline.fit(X_train, y_train[output_col])
     31 # New data for prediction. This example is simplified and should be replaced with actual new data.
     32 # Ensure X_new has the same number of features as X_train. Here, we use np.nan as placeholders.
     33 X_new = np.array([[0.5, 1.2] + [np.nan] * (224)])  # Adjusted to match the feature count of the trained model

File ~/miniconda3/lib/python3.10/site-packages/sklearn/base.py:1351, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1344     estimator._validate_params()
   1346 with config_context(
   1347     skip_parameter_validation=(
   1348         prefer_skip_nested_validation or global_skip_validation
   1349     )
   1350 ):
-> 1351     return fit_method(estimator, *args, **kwargs)

File ~/miniconda3/lib/python3.10/site-packages/sklearn/pipeline.py:475, in Pipeline.fit(self, X, y, **params)
    473     if self._final_estimator != "passthrough":
    474         last_step_params = routed_params[self.steps[-1][0]]
--> 475         self._final_estimator.fit(Xt, y, **last_step_params["fit"])
    477 return self

File ~/miniconda3/lib/python3.10/site-packages/sklearn/base.py:1351, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1344     estimator._validate_params()
   1346 with config_context(
   1347     skip_parameter_validation=(
   1348         prefer_skip_nested_validation or global_skip_validation
   1349     )
   1350 ):
-> 1351     return fit_method(estimator, *args, **kwargs)

File ~/miniconda3/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:1246, in LogisticRegression.fit(self, X, y, sample_weight)
   1244 classes_ = self.classes_
   1245 if n_classes < 2:
-> 1246     raise ValueError(
   1247         "This solver needs samples of at least 2 classes"
   1248         " in the data, but the data contains only one"
   1249         " class: %r"
   1250         % classes_[0]
   1251     )
   1253 if len(self.classes_) == 2:
   1254     n_classes = 1

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0
In [ ]:
len(predictions)
Out[ ]:
51
In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
import warnings


warnings.filterwarnings("ignore", message="X does not have valid feature names, but SimpleImputer was fitted with feature names")

# Assuming combined_csv_data is your DataFrame and it has been loaded already
# Replace 'combined_csv_data' with the actual variable name of your DataFrame

# Fill missing values in output columns with a default value (e.g., 0)
y.fillna(0, inplace=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Dictionary to store the pipeline for each output column
pipelines = {}

# Train a pipeline for each output column
for output_col in output_columns:
    pipeline = make_pipeline(
        SimpleImputer(strategy='mean'),  # Impute missing values
        LogisticRegression(max_iter=1000)  # Logistic regression
    )
    # Fit the pipeline on the training data
    pipeline.fit(X_train, y_train[output_col])
    pipelines[output_col] = pipeline

# Predicting the probabilities for each row in X_test
predictions = {col: [] for col in output_columns}  # Initialize dictionary to store predictions

for index, row in X_test.iterrows():
    for output_col in output_columns:
        # Predict the probability for the current row and output column
        pred_prob = pipelines[output_col].predict_proba(row.values.reshape(1, -1))[0][1]
        predictions[output_col].append(pred_prob)

# Optionally, print out the predicted probabilities for the first few rows of X_test
for i, (index, row) in enumerate(X_test.iterrows()):
    if i >= 5:  # Limit output to first 5 rows
        break
    print(f"Predictions for row {index}:")
    for output_col in output_columns:
        print(f"  {output_col}: {predictions[output_col][i]:.2%}")
    print()  # Newline for readability
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_43744/1352247473.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  y.fillna(0, inplace=True)
Predictions for row 225548:
  label:LYING_DOWN: 45.00%
  label:SITTING: 36.16%
  label:FIX_walking: 2.10%
  label:FIX_running: 0.29%
  label:BICYCLING: 0.49%
  label:SLEEPING: 22.02%
  label:LAB_WORK: 0.57%
  label:IN_CLASS: 1.63%
  label:IN_A_MEETING: 1.36%
  label:LOC_main_workplace: 9.00%
  label:OR_indoors: 63.17%
  label:OR_outside: 3.21%
  label:IN_A_CAR: 0.65%
  label:ON_A_BUS: 0.47%
  label:DRIVE_-_I_M_THE_DRIVER: 2.12%
  label:DRIVE_-_I_M_A_PASSENGER: 0.67%
  label:LOC_home: 49.30%
  label:FIX_restaurant: 0.55%
  label:PHONE_IN_POCKET: 6.24%
  label:OR_exercise: 0.83%
  label:COOKING: 1.06%
  label:SHOPPING: 0.49%
  label:STROLLING: 0.21%
  label:DRINKING__ALCOHOL_: 0.39%
  label:BATHING_-_SHOWER: 0.55%
  label:CLEANING: 1.02%
  label:DOING_LAUNDRY: 0.15%
  label:WASHING_DISHES: 0.33%
  label:WATCHING_TV: 3.51%
  label:SURFING_THE_INTERNET: 5.18%
  label:AT_A_PARTY: 0.39%
  label:AT_A_BAR: 0.15%
  label:LOC_beach: 0.09%
  label:SINGING: 0.17%
  label:TALKING: 9.61%
  label:COMPUTER_WORK: 8.82%
  label:EATING: 2.66%
  label:TOILET: 0.56%
  label:GROOMING: 0.80%
  label:DRESSING: 0.59%
  label:AT_THE_GYM: 0.09%
  label:STAIRS_-_GOING_UP: 0.06%
  label:STAIRS_-_GOING_DOWN: 0.20%
  label:ELEVATOR: 0.05%
  label:OR_standing: 10.00%
  label:AT_SCHOOL: 4.26%
  label:PHONE_IN_HAND: 3.87%
  label:PHONE_IN_BAG: 1.31%
  label:PHONE_ON_TABLE: 30.48%
  label:WITH_CO-WORKERS: 1.65%
  label:WITH_FRIENDS: 3.26%

Predictions for row 22229:
  label:LYING_DOWN: 45.50%
  label:SITTING: 36.13%
  label:FIX_walking: 4.86%
  label:FIX_running: 0.28%
  label:BICYCLING: 0.11%
  label:SLEEPING: 21.97%
  label:LAB_WORK: 0.26%
  label:IN_CLASS: 1.62%
  label:IN_A_MEETING: 1.35%
  label:LOC_main_workplace: 8.96%
  label:OR_indoors: 64.53%
  label:OR_outside: 3.19%
  label:IN_A_CAR: 0.97%
  label:ON_A_BUS: 0.47%
  label:DRIVE_-_I_M_THE_DRIVER: 2.10%
  label:DRIVE_-_I_M_A_PASSENGER: 0.66%
  label:LOC_home: 50.70%
  label:FIX_restaurant: 0.54%
  label:PHONE_IN_POCKET: 6.20%
  label:OR_exercise: 0.21%
  label:COOKING: 1.05%
  label:SHOPPING: 0.48%
  label:STROLLING: 0.21%
  label:DRINKING__ALCOHOL_: 0.39%
  label:BATHING_-_SHOWER: 0.54%
  label:CLEANING: 1.00%
  label:DOING_LAUNDRY: 0.14%
  label:WASHING_DISHES: 0.33%
  label:WATCHING_TV: 3.48%
  label:SURFING_THE_INTERNET: 5.15%
  label:AT_A_PARTY: 0.39%
  label:AT_A_BAR: 0.15%
  label:LOC_beach: 0.09%
  label:SINGING: 0.17%
  label:TALKING: 9.56%
  label:COMPUTER_WORK: 2.17%
  label:EATING: 5.67%
  label:TOILET: 1.02%
  label:GROOMING: 0.79%
  label:DRESSING: 0.58%
  label:AT_THE_GYM: 0.10%
  label:STAIRS_-_GOING_UP: 0.10%
  label:STAIRS_-_GOING_DOWN: 0.19%
  label:ELEVATOR: 0.05%
  label:OR_standing: 9.95%
  label:AT_SCHOOL: 4.24%
  label:PHONE_IN_HAND: 3.84%
  label:PHONE_IN_BAG: 4.32%
  label:PHONE_ON_TABLE: 30.44%
  label:WITH_CO-WORKERS: 1.63%
  label:WITH_FRIENDS: 7.56%

Predictions for row 257345:
  label:LYING_DOWN: 31.26%
  label:SITTING: 36.12%
  label:FIX_walking: 4.04%
  label:FIX_running: 0.28%
  label:BICYCLING: 0.57%
  label:SLEEPING: 21.95%
  label:LAB_WORK: 0.48%
  label:IN_CLASS: 1.61%
  label:IN_A_MEETING: 1.34%
  label:LOC_main_workplace: 8.94%
  label:OR_indoors: 53.19%
  label:OR_outside: 3.18%
  label:IN_A_CAR: 0.30%
  label:ON_A_BUS: 0.46%
  label:DRIVE_-_I_M_THE_DRIVER: 2.09%
  label:DRIVE_-_I_M_A_PASSENGER: 0.66%
  label:LOC_home: 48.89%
  label:FIX_restaurant: 0.54%
  label:PHONE_IN_POCKET: 6.19%
  label:OR_exercise: 1.68%
  label:COOKING: 1.04%
  label:SHOPPING: 0.48%
  label:STROLLING: 0.21%
  label:DRINKING__ALCOHOL_: 0.38%
  label:BATHING_-_SHOWER: 0.54%
  label:CLEANING: 1.00%
  label:DOING_LAUNDRY: 0.14%
  label:WASHING_DISHES: 0.32%
  label:WATCHING_TV: 3.47%
  label:SURFING_THE_INTERNET: 5.13%
  label:AT_A_PARTY: 0.38%
  label:AT_A_BAR: 0.15%
  label:LOC_beach: 0.21%
  label:SINGING: 0.17%
  label:TALKING: 9.54%
  label:COMPUTER_WORK: 6.79%
  label:EATING: 3.83%
  label:TOILET: 0.51%
  label:GROOMING: 0.79%
  label:DRESSING: 0.58%
  label:AT_THE_GYM: 0.36%
  label:STAIRS_-_GOING_UP: 0.27%
  label:STAIRS_-_GOING_DOWN: 0.19%
  label:ELEVATOR: 0.05%
  label:OR_standing: 9.94%
  label:AT_SCHOOL: 19.61%
  label:PHONE_IN_HAND: 3.83%
  label:PHONE_IN_BAG: 2.26%
  label:PHONE_ON_TABLE: 30.43%
  label:WITH_CO-WORKERS: 1.63%
  label:WITH_FRIENDS: 8.64%

Predictions for row 291119:
  label:LYING_DOWN: 27.18%
  label:SITTING: 36.20%
  label:FIX_walking: 4.28%
  label:FIX_running: 0.29%
  label:BICYCLING: 0.63%
  label:SLEEPING: 22.09%
  label:LAB_WORK: 0.81%
  label:IN_CLASS: 1.65%
  label:IN_A_MEETING: 1.38%
  label:LOC_main_workplace: 9.06%
  label:OR_indoors: 50.69%
  label:OR_outside: 3.24%
  label:IN_A_CAR: 0.63%
  label:ON_A_BUS: 0.48%
  label:DRIVE_-_I_M_THE_DRIVER: 2.14%
  label:DRIVE_-_I_M_A_PASSENGER: 0.68%
  label:LOC_home: 43.66%
  label:FIX_restaurant: 0.55%
  label:PHONE_IN_POCKET: 6.28%
  label:OR_exercise: 1.48%
  label:COOKING: 1.07%
  label:SHOPPING: 0.50%
  label:STROLLING: 0.22%
  label:DRINKING__ALCOHOL_: 0.40%
  label:BATHING_-_SHOWER: 0.56%
  label:CLEANING: 1.03%
  label:DOING_LAUNDRY: 0.15%
  label:WASHING_DISHES: 0.34%
  label:WATCHING_TV: 3.54%
  label:SURFING_THE_INTERNET: 5.22%
  label:AT_A_PARTY: 0.40%
  label:AT_A_BAR: 0.15%
  label:LOC_beach: 0.16%
  label:SINGING: 0.18%
  label:TALKING: 9.67%
  label:COMPUTER_WORK: 14.49%
  label:EATING: 4.10%
  label:TOILET: 0.61%
  label:GROOMING: 0.81%
  label:DRESSING: 0.60%
  label:AT_THE_GYM: 0.27%
  label:STAIRS_-_GOING_UP: 0.20%
  label:STAIRS_-_GOING_DOWN: 0.20%
  label:ELEVATOR: 0.05%
  label:OR_standing: 10.06%
  label:AT_SCHOOL: 13.05%
  label:PHONE_IN_HAND: 3.90%
  label:PHONE_IN_BAG: 2.36%
  label:PHONE_ON_TABLE: 30.54%
  label:WITH_CO-WORKERS: 1.67%
  label:WITH_FRIENDS: 6.86%

Predictions for row 47990:
  label:LYING_DOWN: 29.81%
  label:SITTING: 36.18%
  label:FIX_walking: 5.65%
  label:FIX_running: 0.29%
  label:BICYCLING: 1.62%
  label:SLEEPING: 22.05%
  label:LAB_WORK: 1.07%
  label:IN_CLASS: 1.64%
  label:IN_A_MEETING: 1.37%
  label:LOC_main_workplace: 9.03%
  label:OR_indoors: 49.20%
  label:OR_outside: 3.23%
  label:IN_A_CAR: 0.79%
  label:ON_A_BUS: 0.47%
  label:DRIVE_-_I_M_THE_DRIVER: 2.13%
  label:DRIVE_-_I_M_A_PASSENGER: 0.67%
  label:LOC_home: 38.02%
  label:FIX_restaurant: 0.55%
  label:PHONE_IN_POCKET: 6.26%
  label:OR_exercise: 1.76%
  label:COOKING: 1.06%
  label:SHOPPING: 0.49%
  label:STROLLING: 0.22%
  label:DRINKING__ALCOHOL_: 0.39%
  label:BATHING_-_SHOWER: 0.55%
  label:CLEANING: 1.02%
  label:DOING_LAUNDRY: 0.15%
  label:WASHING_DISHES: 0.33%
  label:WATCHING_TV: 3.52%
  label:SURFING_THE_INTERNET: 5.20%
  label:AT_A_PARTY: 0.39%
  label:AT_A_BAR: 0.15%
  label:LOC_beach: 0.11%
  label:SINGING: 0.18%
  label:TALKING: 9.63%
  label:COMPUTER_WORK: 9.80%
  label:EATING: 3.08%
  label:TOILET: 0.81%
  label:GROOMING: 0.81%
  label:DRESSING: 0.59%
  label:AT_THE_GYM: 0.19%
  label:STAIRS_-_GOING_UP: 0.13%
  label:STAIRS_-_GOING_DOWN: 0.20%
  label:ELEVATOR: 0.05%
  label:OR_standing: 10.02%
  label:AT_SCHOOL: 4.95%
  label:PHONE_IN_HAND: 3.88%
  label:PHONE_IN_BAG: 1.61%
  label:PHONE_ON_TABLE: 30.51%
  label:WITH_CO-WORKERS: 1.66%
  label:WITH_FRIENDS: 4.48%

In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
import warnings


warnings.filterwarnings("ignore", message="X does not have valid feature names, but SimpleImputer was fitted with feature names")

# Assuming combined_csv_data is your DataFrame and it has been loaded already
# Replace 'combined_csv_data' with the actual variable name of your DataFrame

# Fill missing values in output columns with a default value (e.g., 0)
y.fillna(0, inplace=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Dictionary to store the pipeline for each output column
pipelines = {}

# Train a pipeline for each output column
for output_col in output_columns:
    pipeline = make_pipeline(
        SimpleImputer(strategy='mean'),  # Impute missing values
        LogisticRegression(max_iter=1000)  # Logistic regression
    )
    # Fit the pipeline on the training data
    pipeline.fit(X_train, y_train[output_col])
    pipelines[output_col] = pipeline

# Predicting the probabilities for each row in X_test
predictions = {col: [] for col in output_columns}  # Initialize dictionary to store predictions

for index, row in X_test.iterrows():
    for output_col in output_columns:
        # Predict the probability for the current row and output column
        pred_prob = pipelines[output_col].predict_proba(row.values.reshape(1, -1))[0][1]
        predictions[output_col].append(pred_prob)

# Optionally, print out the predicted probabilities for the first few rows of X_test
for i, (index, row) in enumerate(X_test.iterrows()):
    if i >= 5:  # Limit output to first 5 rows
        break
    print(f"Predictions for row {index}:")
    for output_col in output_columns:
        print(f"  {output_col}: {predictions[output_col][i]:.2%}")
    print()  # Newline for readability
/var/folders/9d/r4wkb8dj54b5k6_vd_8stbq00000gn/T/ipykernel_43744/1352247473.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  y.fillna(0, inplace=True)
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[23], line 30
     25     pipeline = make_pipeline(
     26         SimpleImputer(strategy='mean'),  # Impute missing values
     27         LogisticRegression(max_iter=1000)  # Logistic regression
     28     )
     29     # Fit the pipeline on the training data
---> 30     pipeline.fit(X_train, y_train[output_col])
     31     pipelines[output_col] = pipeline
     33 # Predicting the probabilities for each row in X_test

File ~/miniconda3/lib/python3.10/site-packages/sklearn/base.py:1351, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1344     estimator._validate_params()
   1346 with config_context(
   1347     skip_parameter_validation=(
   1348         prefer_skip_nested_validation or global_skip_validation
   1349     )
   1350 ):
-> 1351     return fit_method(estimator, *args, **kwargs)

File ~/miniconda3/lib/python3.10/site-packages/sklearn/pipeline.py:471, in Pipeline.fit(self, X, y, **params)
    428 """Fit the model.
    429 
    430 Fit all the transformers one after the other and sequentially transform the
   (...)
    468     Pipeline with fitted steps.
    469 """
    470 routed_params = self._check_method_params(method="fit", props=params)
--> 471 Xt = self._fit(X, y, routed_params)
    472 with _print_elapsed_time("Pipeline", self._log_message(len(self.steps) - 1)):
    473     if self._final_estimator != "passthrough":

File ~/miniconda3/lib/python3.10/site-packages/sklearn/pipeline.py:408, in Pipeline._fit(self, X, y, routed_params)
    406     cloned_transformer = clone(transformer)
    407 # Fit or load from cache the current transformer
--> 408 X, fitted_transformer = fit_transform_one_cached(
    409     cloned_transformer,
    410     X,
    411     y,
    412     None,
    413     message_clsname="Pipeline",
    414     message=self._log_message(step_idx),
    415     params=routed_params[name],
    416 )
    417 # Replace the transformer of the step with the fitted
    418 # transformer. This is necessary when loading the transformer
    419 # from the cache.
    420 self.steps[step_idx] = (name, fitted_transformer)

File ~/miniconda3/lib/python3.10/site-packages/joblib/memory.py:353, in NotMemorizedFunc.__call__(self, *args, **kwargs)
    352 def __call__(self, *args, **kwargs):
--> 353     return self.func(*args, **kwargs)

File ~/miniconda3/lib/python3.10/site-packages/sklearn/pipeline.py:1303, in _fit_transform_one(transformer, X, y, weight, message_clsname, message, params)
   1301 with _print_elapsed_time(message_clsname, message):
   1302     if hasattr(transformer, "fit_transform"):
-> 1303         res = transformer.fit_transform(X, y, **params.get("fit_transform", {}))
   1304     else:
   1305         res = transformer.fit(X, y, **params.get("fit", {})).transform(
   1306             X, **params.get("transform", {})
   1307         )

File ~/miniconda3/lib/python3.10/site-packages/sklearn/utils/_set_output.py:273, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    271 @wraps(f)
    272 def wrapped(self, X, *args, **kwargs):
--> 273     data_to_wrap = f(self, X, *args, **kwargs)
    274     if isinstance(data_to_wrap, tuple):
    275         # only wrap the first output for cross decomposition
    276         return_tuple = (
    277             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    278             *data_to_wrap[1:],
    279         )

File ~/miniconda3/lib/python3.10/site-packages/sklearn/base.py:1064, in TransformerMixin.fit_transform(self, X, y, **fit_params)
   1061     return self.fit(X, **fit_params).transform(X)
   1062 else:
   1063     # fit method of arity 2 (supervised transformation)
-> 1064     return self.fit(X, y, **fit_params).transform(X)

File ~/miniconda3/lib/python3.10/site-packages/sklearn/utils/_set_output.py:273, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    271 @wraps(f)
    272 def wrapped(self, X, *args, **kwargs):
--> 273     data_to_wrap = f(self, X, *args, **kwargs)
    274     if isinstance(data_to_wrap, tuple):
    275         # only wrap the first output for cross decomposition
    276         return_tuple = (
    277             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    278             *data_to_wrap[1:],
    279         )

File ~/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:537, in SimpleImputer.transform(self, X)
    522 """Impute all missing values in `X`.
    523 
    524 Parameters
   (...)
    533     `X` with imputed values.
    534 """
    535 check_is_fitted(self)
--> 537 X = self._validate_input(X, in_fit=False)
    538 statistics = self.statistics_
    540 if X.shape[1] != statistics.shape[0]:

File ~/miniconda3/lib/python3.10/site-packages/sklearn/impute/_base.py:322, in SimpleImputer._validate_input(self, X, in_fit)
    319     force_all_finite = True
    321 try:
--> 322     X = self._validate_data(
    323         X,
    324         reset=in_fit,
    325         accept_sparse="csc",
    326         dtype=dtype,
    327         force_all_finite=force_all_finite,
    328         copy=self.copy,
    329     )
    330 except ValueError as ve:
    331     if "could not convert" in str(ve):

File ~/miniconda3/lib/python3.10/site-packages/sklearn/base.py:633, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, cast_to_ndarray, **check_params)
    631         out = X, y
    632 elif not no_val_X and no_val_y:
--> 633     out = check_array(X, input_name="X", **check_params)
    634 elif no_val_X and not no_val_y:
    635     out = _check_y(y, **check_params)

File ~/miniconda3/lib/python3.10/site-packages/sklearn/utils/validation.py:1013, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
   1010 if copy:
   1011     if _is_numpy_namespace(xp):
   1012         # only make a copy if `array` and `array_orig` may share memory`
-> 1013         if np.may_share_memory(array, array_orig):
   1014             array = _asarray_with_order(
   1015                 array, dtype=dtype, order=order, copy=True, xp=xp
   1016             )
   1017     else:
   1018         # always make a copy for non-numpy arrays

File <__array_function__ internals>:180, in may_share_memory(*args, **kwargs)

File ~/miniconda3/lib/python3.10/site-packages/pandas/core/generic.py:2149, in NDFrame.__array__(self, dtype)
   2148 def __array__(self, dtype: npt.DTypeLike | None = None) -> np.ndarray:
-> 2149     values = self._values
   2150     arr = np.asarray(values, dtype=dtype)
   2151     if (
   2152         astype_is_view(values.dtype, arr.dtype)
   2153         and using_copy_on_write()
   2154         and self._mgr.is_single_block
   2155     ):
   2156         # Check if both conversions can be done without a copy

File ~/miniconda3/lib/python3.10/site-packages/pandas/core/frame.py:1116, in DataFrame._values(self)
   1114 blocks = mgr.blocks
   1115 if len(blocks) != 1:
-> 1116     return ensure_wrapped_if_datetimelike(self.values)
   1118 arr = blocks[0].values
   1119 if arr.ndim == 1:
   1120     # non-2D ExtensionArray

File ~/miniconda3/lib/python3.10/site-packages/pandas/core/frame.py:12637, in DataFrame.values(self)
  12563 @property
  12564 def values(self) -> np.ndarray:
  12565     """
  12566     Return a Numpy representation of the DataFrame.
  12567 
   (...)
  12635            ['monkey', nan, None]], dtype=object)
  12636     """
> 12637     return self._mgr.as_array()

File ~/miniconda3/lib/python3.10/site-packages/pandas/core/internals/managers.py:1693, in BlockManager.as_array(self, dtype, copy, na_value)
   1691         arr.flags.writeable = False
   1692 else:
-> 1693     arr = self._interleave(dtype=dtype, na_value=na_value)
   1694     # The underlying data was copied within _interleave, so no need
   1695     # to further copy if copy=True or setting na_value
   1697 if na_value is lib.no_default:

File ~/miniconda3/lib/python3.10/site-packages/pandas/core/internals/managers.py:1753, in BlockManager._interleave(self, dtype, na_value)
   1751         arr = blk.get_values(dtype)
   1752     result[rl.indexer] = arr
-> 1753     itemmask[rl.indexer] = 1
   1755 if not itemmask.all():
   1756     raise AssertionError("Some items were not contained in blocks")

KeyboardInterrupt: 
In [ ]:
combined_csv_data.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 combined_csv_data.head()

NameError: name 'combined_csv_data' is not defined
In [ ]:
import pandas as pd
import numpy as np
from threading import Timer

class Phone:
    def __init__(self, data_df):
        self.data_df = data_df  # Assume data_df is a DataFrame loaded with user data

    def collect_data(self, userid):
        """Collects a random data row for a given user."""
        user_data = self.data_df[self.data_df['user_id'] == userid].sample(n=1)
        return user_data

    def process_data(self, userid, model):
        """Processes data using a specified model."""
        data = self.collect_data(userid)
        # Assuming `model` is a function passed to process the data
        processed_data = model(data)
        return processed_data

    def send_data(self, userid, interval, server):
        """Periodically sends data at specified intervals."""
        data = self.collect_data(userid)
        server.store_update_data(data)
        Timer(interval, self.send_data, args=[userid, interval, server]).start()

class Server:
    def __init__(self):
        self.storage_df = pd.DataFrame()  # Separate DataFrame for storing data

    def request_data(self, phone, userid, raw=True):
        """Requests data from the Phone class."""
        if raw:
            return phone.collect_data(userid)
        else:
            return phone.process_data(userid, self.process_data)  # Example: self.process_data as a placeholder

    def process_data(self, data):
        """Processes data."""
        # This is a placeholder for data processing logic, which could involve ML models or other transformations
        processed_data = data  # Simplified for demonstration
        return processed_data

    def store_update_data(self, data):
        """Stores or updates data in a separate DataFrame."""
        self.storage_df = pd.concat([self.storage_df, data], ignore_index=True)
In [ ]:
data_df = pd.read_csv('ExtraSensory_Combined_User_Data.csv')
In [ ]:
phone = Phone(data_df)
server = Server()
In [ ]:
userid="81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0"
raw_data = server.request_data(phone, userid, raw=True)
processed_data = server.request_data(phone, userid, raw=False)
server.store_update_data(raw_data)
server.store_update_data(processed_data)
In [ ]:
# df = X and y

# Format data into our stuture after predition

# Tablaue

# User - Dynamic
    # User Activity against timestamps
    # Porbablity Related - Select User and Time  (Log Reg) Probablities
In [ ]:
server.storage_df
Out[ ]:
timestamp raw_acc:magnitude_stats:mean raw_acc:magnitude_stats:std raw_acc:magnitude_stats:moment3 raw_acc:magnitude_stats:moment4 raw_acc:magnitude_stats:percentile25 raw_acc:magnitude_stats:percentile50 raw_acc:magnitude_stats:percentile75 raw_acc:magnitude_stats:value_entropy raw_acc:magnitude_stats:time_entropy ... label:ELEVATOR label:OR_standing label:AT_SCHOOL label:PHONE_IN_HAND label:PHONE_IN_BAG label:PHONE_ON_TABLE label:WITH_CO-WORKERS label:WITH_FRIENDS label_source user_id
0 1446541934 1.020442 0.001224 -0.000816 0.001642 1.019684 1.020508 1.021267 2.513081 6.684611 ... NaN 0.0 NaN NaN NaN NaN NaN 0.0 2 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0
1 1446295467 1.035055 0.004175 0.003772 0.005590 1.032436 1.034281 1.036872 2.462429 6.684604 ... NaN 0.0 NaN NaN NaN NaN NaN 0.0 2 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0

2 rows × 279 columns

In [ ]:
data_df.head()
Out[ ]:
timestamp raw_acc:magnitude_stats:mean raw_acc:magnitude_stats:std raw_acc:magnitude_stats:moment3 raw_acc:magnitude_stats:moment4 raw_acc:magnitude_stats:percentile25 raw_acc:magnitude_stats:percentile50 raw_acc:magnitude_stats:percentile75 raw_acc:magnitude_stats:value_entropy raw_acc:magnitude_stats:time_entropy ... label:ELEVATOR label:OR_standing label:AT_SCHOOL label:PHONE_IN_HAND label:PHONE_IN_BAG label:PHONE_ON_TABLE label:WITH_CO-WORKERS label:WITH_FRIENDS label_source user_id
0 1446141691 1.009726 0.002838 -0.002296 0.005568 1.008208 1.009735 1.011174 1.572784 6.684608 ... NaN 0.0 NaN NaN NaN NaN NaN 0.0 2 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0
1 1446141752 1.009822 0.004624 0.003040 0.008459 1.007704 1.009619 1.011857 1.754729 6.684601 ... NaN 0.0 NaN NaN NaN NaN NaN 0.0 2 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0
2 1446141805 1.009667 0.004781 -0.007802 0.014457 1.008038 1.009772 1.011139 1.012852 6.684600 ... NaN 0.0 NaN NaN NaN NaN NaN 0.0 2 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0
3 1446141873 1.008839 0.003543 0.001831 0.007082 1.007134 1.008803 1.010433 1.511878 6.684606 ... NaN 0.0 NaN NaN NaN NaN NaN 0.0 2 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0
4 1446141925 1.008193 0.001753 -0.000744 0.002439 1.007142 1.008234 1.009350 2.347186 6.684610 ... NaN 0.0 NaN NaN NaN NaN NaN 0.0 2 81536B0A-8DBF-4D8A-AC24-9543E2E4C8E0

5 rows × 279 columns

In [ ]:
len(combined_csv_data)
Out[ ]:
377346
In [ ]:
nan_count_full = combined_csv_data[features].isna().sum()
nan_count_sorted_full = nan_count.sort_values(ascending=False)
nan_count_sorted_full
Out[ ]:
lf_measurements:proximity                                                  6407
location:best_vertical_accuracy                                            6407
location:max_altitude                                                      6407
location:min_altitude                                                      6407
lf_measurements:screen_brightness                                          6407
watch_heading:std_sin                                                       228
watch_heading:mean_sin                                                      228
watch_heading:mom4_cos                                                      228
watch_heading:mom3_cos                                                      228
watch_heading:std_cos                                                       228
watch_heading:mean_cos                                                      228
watch_heading:mom3_sin                                                      228
watch_heading:mom4_sin                                                      228
watch_heading:entropy_8bins                                                 228
location:max_speed                                                           34
location:min_speed                                                           34
watch_acceleration:magnitude_autocorrelation:period                           7
watch_acceleration:magnitude_spectrum:log_energy_band3                        7
watch_acceleration:magnitude_spectrum:log_energy_band4                        7
watch_acceleration:magnitude_spectrum:spectral_entropy                        7
watch_acceleration:spectrum:x_log_energy_band3                                7
watch_acceleration:magnitude_autocorrelation:normalized_ac                    7
watch_acceleration:3d:mean_x                                                  7
watch_acceleration:3d:mean_y                                                  7
watch_acceleration:magnitude_spectrum:log_energy_band1                        7
watch_acceleration:magnitude_spectrum:log_energy_band2                        7
watch_acceleration:magnitude_stats:moment3                                    7
watch_acceleration:magnitude_spectrum:log_energy_band0                        7
watch_acceleration:magnitude_stats:time_entropy                               7
watch_acceleration:magnitude_stats:value_entropy                              7
watch_acceleration:magnitude_stats:percentile75                               7
watch_acceleration:magnitude_stats:percentile50                               7
watch_acceleration:magnitude_stats:percentile25                               7
watch_acceleration:magnitude_stats:moment4                                    7
watch_acceleration:3d:std_x                                                   7
watch_acceleration:magnitude_stats:std                                        7
watch_acceleration:magnitude_stats:mean                                       7
watch_acceleration:3d:mean_z                                                  7
watch_acceleration:3d:ro_xy                                                   7
watch_acceleration:3d:std_y                                                   7
watch_acceleration:spectrum:y_log_energy_band4                                7
watch_acceleration:relative_directions:avr_cosine_similarity_lag_range4       7
watch_acceleration:relative_directions:avr_cosine_similarity_lag_range3       7
watch_acceleration:relative_directions:avr_cosine_similarity_lag_range2       7
watch_acceleration:relative_directions:avr_cosine_similarity_lag_range1       7
watch_acceleration:relative_directions:avr_cosine_similarity_lag_range0       7
watch_acceleration:spectrum:z_log_energy_band4                                7
watch_acceleration:spectrum:z_log_energy_band3                                7
watch_acceleration:3d:std_z                                                   7
watch_acceleration:spectrum:z_log_energy_band1                                7
watch_acceleration:spectrum:z_log_energy_band0                                7
watch_acceleration:spectrum:z_log_energy_band2                                7
watch_acceleration:spectrum:y_log_energy_band3                                7
watch_acceleration:spectrum:x_log_energy_band1                                7
watch_acceleration:spectrum:y_log_energy_band2                                7
watch_acceleration:3d:ro_yz                                                   7
watch_acceleration:spectrum:x_log_energy_band0                                7
watch_acceleration:3d:ro_xz                                                   7
watch_acceleration:spectrum:x_log_energy_band2                                7
watch_acceleration:spectrum:x_log_energy_band4                                7
watch_acceleration:spectrum:y_log_energy_band0                                7
watch_acceleration:spectrum:y_log_energy_band1                                7
location:best_horizontal_accuracy                                             0
location_quick_features:mean_abs_long_deriv                                   0
location:diameter                                                             0
location:log_diameter                                                         0
location_quick_features:std_lat                                               0
location_quick_features:std_long                                              0
location_quick_features:lat_change                                            0
location_quick_features:long_change                                           0
location_quick_features:mean_abs_lat_deriv                                    0
audio_naive:mfcc0:mean                                                        0
location:log_latitude_range                                                   0
discrete:on_the_phone:is_False                                                0
discrete:ringer_mode:missing                                                  0
discrete:ringer_mode:is_silent_with_vibrate                                   0
discrete:ringer_mode:is_silent_no_vibrate                                     0
discrete:ringer_mode:is_normal                                                0
discrete:on_the_phone:missing                                                 0
discrete:on_the_phone:is_True                                                 0
discrete:battery_state:missing                                                0
audio_naive:mfcc1:mean                                                        0
discrete:battery_state:is_full                                                0
discrete:battery_state:is_charging                                            0
discrete:battery_state:is_discharging                                         0
discrete:battery_state:is_not_charging                                        0
discrete:battery_state:is_unplugged                                           0
discrete:battery_state:is_unknown                                             0
discrete:wifi_status:is_not_reachable                                         0
discrete:wifi_status:is_reachable_via_wifi                                    0
discrete:wifi_status:is_reachable_via_wwan                                    0
discrete:wifi_status:missing                                                  0
lf_measurements:light                                                         0
lf_measurements:pressure                                                      0
lf_measurements:proximity_cm                                                  0
lf_measurements:relative_humidity                                             0
lf_measurements:battery_level                                                 0
lf_measurements:temperature_ambient                                           0
discrete:time_of_day:between0and6                                             0
discrete:time_of_day:between3and9                                             0
discrete:time_of_day:between6and12                                            0
discrete:time_of_day:between9and15                                            0
discrete:time_of_day:between12and18                                           0
discrete:time_of_day:between15and21                                           0
discrete:time_of_day:between18and24                                           0
discrete:battery_plugged:missing                                              0
discrete:battery_plugged:is_wireless                                          0
discrete:battery_plugged:is_usb                                               0
audio_naive:mfcc3:std                                                         0
audio_naive:mfcc2:mean                                                        0
audio_naive:mfcc3:mean                                                        0
audio_naive:mfcc4:mean                                                        0
audio_naive:mfcc5:mean                                                        0
audio_naive:mfcc6:mean                                                        0
audio_naive:mfcc7:mean                                                        0
audio_naive:mfcc8:mean                                                        0
audio_naive:mfcc9:mean                                                        0
audio_naive:mfcc10:mean                                                       0
audio_naive:mfcc11:mean                                                       0
audio_naive:mfcc12:mean                                                       0
audio_naive:mfcc0:std                                                         0
audio_naive:mfcc1:std                                                         0
audio_naive:mfcc2:std                                                         0
audio_naive:mfcc4:std                                                         0
discrete:battery_plugged:is_ac                                                0
audio_naive:mfcc5:std                                                         0
audio_naive:mfcc6:std                                                         0
audio_naive:mfcc7:std                                                         0
audio_naive:mfcc8:std                                                         0
audio_naive:mfcc9:std                                                         0
audio_naive:mfcc10:std                                                        0
audio_naive:mfcc11:std                                                        0
audio_naive:mfcc12:std                                                        0
audio_properties:max_abs_value                                                0
audio_properties:normalization_multiplier                                     0
discrete:app_state:is_active                                                  0
discrete:app_state:is_inactive                                                0
discrete:app_state:is_background                                              0
discrete:app_state:missing                                                    0
location:log_longitude_range                                                  0
timestamp                                                                     0
location:num_valid_updates                                                    0
proc_gyro:magnitude_stats:percentile25                                        0
raw_acc:3d:std_z                                                              0
raw_acc:3d:ro_xy                                                              0
raw_acc:3d:ro_xz                                                              0
raw_acc:3d:ro_yz                                                              0
proc_gyro:magnitude_stats:mean                                                0
proc_gyro:magnitude_stats:std                                                 0
proc_gyro:magnitude_stats:moment3                                             0
proc_gyro:magnitude_stats:moment4                                             0
proc_gyro:magnitude_stats:percentile50                                        0
proc_gyro:magnitude_autocorrelation:period                                    0
proc_gyro:magnitude_stats:percentile75                                        0
proc_gyro:magnitude_stats:value_entropy                                       0
proc_gyro:magnitude_stats:time_entropy                                        0
proc_gyro:magnitude_spectrum:log_energy_band0                                 0
proc_gyro:magnitude_spectrum:log_energy_band1                                 0
proc_gyro:magnitude_spectrum:log_energy_band2                                 0
proc_gyro:magnitude_spectrum:log_energy_band3                                 0
proc_gyro:magnitude_spectrum:log_energy_band4                                 0
raw_acc:3d:std_y                                                              0
raw_acc:3d:std_x                                                              0
raw_acc:3d:mean_z                                                             0
raw_acc:3d:mean_y                                                             0
raw_acc:magnitude_stats:std                                                   0
raw_acc:magnitude_stats:moment3                                               0
raw_acc:magnitude_stats:moment4                                               0
raw_acc:magnitude_stats:percentile25                                          0
raw_acc:magnitude_stats:percentile50                                          0
raw_acc:magnitude_stats:percentile75                                          0
raw_acc:magnitude_stats:value_entropy                                         0
raw_acc:magnitude_stats:time_entropy                                          0
raw_acc:magnitude_spectrum:log_energy_band0                                   0
raw_acc:magnitude_spectrum:log_energy_band1                                   0
raw_acc:magnitude_spectrum:log_energy_band2                                   0
raw_acc:magnitude_spectrum:log_energy_band3                                   0
raw_acc:magnitude_spectrum:log_energy_band4                                   0
raw_acc:magnitude_spectrum:spectral_entropy                                   0
raw_acc:magnitude_autocorrelation:period                                      0
raw_acc:magnitude_autocorrelation:normalized_ac                               0
raw_acc:3d:mean_x                                                             0
proc_gyro:magnitude_spectrum:spectral_entropy                                 0
proc_gyro:magnitude_autocorrelation:normalized_ac                             0
raw_acc:magnitude_stats:mean                                                  0
raw_magnet:3d:std_y                                                           0
raw_magnet:magnitude_spectrum:log_energy_band4                                0
raw_magnet:magnitude_spectrum:spectral_entropy                                0
raw_magnet:magnitude_autocorrelation:period                                   0
raw_magnet:magnitude_autocorrelation:normalized_ac                            0
raw_magnet:3d:mean_x                                                          0
raw_magnet:3d:mean_y                                                          0
raw_magnet:3d:mean_z                                                          0
raw_magnet:3d:std_x                                                           0
raw_magnet:3d:std_z                                                           0
proc_gyro:3d:mean_x                                                           0
raw_magnet:3d:ro_xy                                                           0
raw_magnet:3d:ro_xz                                                           0
raw_magnet:3d:ro_yz                                                           0
raw_magnet:avr_cosine_similarity_lag_range0                                   0
raw_magnet:avr_cosine_similarity_lag_range1                                   0
raw_magnet:avr_cosine_similarity_lag_range2                                   0
raw_magnet:avr_cosine_similarity_lag_range3                                   0
raw_magnet:avr_cosine_similarity_lag_range4                                   0
raw_magnet:magnitude_spectrum:log_energy_band3                                0
raw_magnet:magnitude_spectrum:log_energy_band2                                0
raw_magnet:magnitude_spectrum:log_energy_band1                                0
raw_magnet:magnitude_spectrum:log_energy_band0                                0
proc_gyro:3d:mean_y                                                           0
proc_gyro:3d:mean_z                                                           0
proc_gyro:3d:std_x                                                            0
proc_gyro:3d:std_y                                                            0
proc_gyro:3d:std_z                                                            0
proc_gyro:3d:ro_xy                                                            0
proc_gyro:3d:ro_xz                                                            0
proc_gyro:3d:ro_yz                                                            0
raw_magnet:magnitude_stats:mean                                               0
raw_magnet:magnitude_stats:std                                                0
raw_magnet:magnitude_stats:moment3                                            0
raw_magnet:magnitude_stats:moment4                                            0
raw_magnet:magnitude_stats:percentile25                                       0
raw_magnet:magnitude_stats:percentile50                                       0
raw_magnet:magnitude_stats:percentile75                                       0
raw_magnet:magnitude_stats:value_entropy                                      0
raw_magnet:magnitude_stats:time_entropy                                       0
discrete:time_of_day:between21and3                                            0
dtype: int64
In [ ]:
hierarchy = build_hierarchy(X.columns)
formatted_hierarchy = format_hierarchy(hierarchy)
print(formatted_hierarchy)
- raw_acc:
  - magnitude_stats:
    -  mean
    -  std
    -  moment3
    -  moment4
    -  percentile25
    -  percentile50
    -  percentile75
    -  value_entropy
    -  time_entropy
  - magnitude_spectrum:
    -  log_energy_band0
    -  log_energy_band1
    -  log_energy_band2
    -  log_energy_band3
    -  log_energy_band4
    -  spectral_entropy
  - magnitude_autocorrelation:
    -  period
    -  normalized_ac
  - 3d:
    -  mean_x
    -  mean_y
    -  mean_z
    -  std_x
    -  std_y
    -  std_z
    -  ro_xy
    -  ro_xz
    -  ro_yz
- proc_gyro:
  - magnitude_stats:
    -  mean
    -  std
    -  moment3
    -  moment4
    -  percentile25
    -  percentile50
    -  percentile75
    -  value_entropy
    -  time_entropy
  - magnitude_spectrum:
    -  log_energy_band0
    -  log_energy_band1
    -  log_energy_band2
    -  log_energy_band3
    -  log_energy_band4
    -  spectral_entropy
  - magnitude_autocorrelation:
    -  period
    -  normalized_ac
  - 3d:
    -  mean_x
    -  mean_y
    -  mean_z
    -  std_x
    -  std_y
    -  std_z
    -  ro_xy
    -  ro_xz
    -  ro_yz
- raw_magnet:
  - magnitude_stats:
    -  mean
    -  std
    -  moment3
    -  moment4
    -  percentile25
    -  percentile50
    -  percentile75
    -  value_entropy
    -  time_entropy
  - magnitude_spectrum:
    -  log_energy_band0
    -  log_energy_band1
    -  log_energy_band2
    -  log_energy_band3
    -  log_energy_band4
    -  spectral_entropy
  - magnitude_autocorrelation:
    -  period
    -  normalized_ac
  - 3d:
    -  mean_x
    -  mean_y
    -  mean_z
    -  std_x
    -  std_y
    -  std_z
    -  ro_xy
    -  ro_xz
    -  ro_yz
  -  avr_cosine_similarity_lag_range0
  -  avr_cosine_similarity_lag_range1
  -  avr_cosine_similarity_lag_range2
  -  avr_cosine_similarity_lag_range3
  -  avr_cosine_similarity_lag_range4
- location:
  -  num_valid_updates
  -  log_latitude_range
  -  log_longitude_range
  -  best_horizontal_accuracy
  -  diameter
  -  log_diameter
- location_quick_features:
  -  std_lat
  -  std_long
  -  lat_change
  -  long_change
  -  mean_abs_lat_deriv
  -  mean_abs_long_deriv
- audio_naive:
  - mfcc0:
    -  mean
    -  std
  - mfcc1:
    -  mean
    -  std
  - mfcc2:
    -  mean
    -  std
  - mfcc3:
    -  mean
    -  std
  - mfcc4:
    -  mean
    -  std
  - mfcc5:
    -  mean
    -  std
  - mfcc6:
    -  mean
    -  std
  - mfcc7:
    -  mean
    -  std
  - mfcc8:
    -  mean
    -  std
  - mfcc9:
    -  mean
    -  std
  - mfcc10:
    -  mean
    -  std
  - mfcc11:
    -  mean
    -  std
  - mfcc12:
    -  mean
    -  std
- audio_properties:
  -  max_abs_value
  -  normalization_multiplier
- discrete:
  - app_state:
    -  is_active
    -  is_inactive
    -  is_background
    -  missing
  - battery_plugged:
    -  is_ac
    -  is_usb
    -  is_wireless
    -  missing
  - battery_state:
    -  is_unknown
    -  is_unplugged
    -  is_not_charging
    -  is_discharging
    -  is_charging
    -  is_full
    -  missing
  - on_the_phone:
    -  is_False
    -  is_True
    -  missing
  - ringer_mode:
    -  is_normal
    -  is_silent_no_vibrate
    -  is_silent_with_vibrate
    -  missing
  - wifi_status:
    -  is_not_reachable
    -  is_reachable_via_wifi
    -  is_reachable_via_wwan
    -  missing
  - time_of_day:
    -  between0and6
    -  between3and9
    -  between6and12
    -  between9and15
    -  between12and18
    -  between15and21
    -  between18and24
    -  between21and3
- lf_measurements:
  -  battery_level
-  timestamp_numeric